<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Informatics @ Northwestern Weblog</title>
	<atom:link href="http://informatics.northwestern.edu/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://informatics.northwestern.edu/blog</link>
	<description>NUBIC is a team dedicated to creating web applications and software tools expressly for clinical and translational research at Northwestern</description>
	<lastBuildDate>Thu, 16 Feb 2012 19:46:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Continuous Delivery reading and resource list</title>
		<link>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/continuous-delivery-reading-and-resource-list/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=continuous-delivery-reading-and-resource-list</link>
		<comments>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/continuous-delivery-reading-and-resource-list/#comments</comments>
		<pubDate>Thu, 16 Feb 2012 19:36:11 +0000</pubDate>
		<dc:creator>Jeff Lunt</dc:creator>
				<category><![CDATA[NUBIC Development]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[continuous delivery]]></category>
		<category><![CDATA[continuous deployment]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[release management]]></category>
		<category><![CDATA[software deployment]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=572</guid>
		<description><![CDATA[My technical project for the year, I&#8217;ve decided, is to build a continuous delivery system inside the NUBIC dev team. Here&#8217;s a quick reading list of source materials that I&#8217;m using to learn how to do it (blog posts to &#8230; <a href="http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/continuous-delivery-reading-and-resource-list/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>My technical project for the year, I&#8217;ve decided, is to build a continuous delivery system inside the NUBIC dev team.</p>
<p>Here&#8217;s a quick reading list of source materials that I&#8217;m using to learn how to do it (blog posts to follow as I document the process of building the system internally):</p>
<ul>
<li>The <a href="http://www.amazon.com/Continuous-Delivery-Deployment-Addison-Wesley-ebook/dp/B003YMNVC0/ref=sr_1_1?s=digital-text&amp;ie=UTF8&amp;qid=1329420497&amp;sr=1-1">Continuous Delivery book</a></li>
<li>The <a href="http://continuousdelivery.com/">Continuous Delivery blog</a></li>
<li><a href="http://radar.oreilly.com/2009/03/continuous-deployment-5-eas.html">Eric Ries&#8217; 5-step primer to Continuous Deployment</a></li>
<li><a href="http://www.startuplessonslearned.com/2010/01/case-study-continuous-deployment-makes.html">Case Study: Continuous deployment makes releases non-events</a></li>
<li><a href="http://engineering.imvu.com/2010/04/09/imvus-approach-to-integrating-quality-assurance-with-continuous-deployment/">IMVU’s Approach to Integrating Quality Assurance with Continuous Deployment</a></li>
<li><a href="http://venturehacks.com/articles/five-whys">A series on the principle of &#8220;The 5 whys&#8221;</a></li>
<li><a href="http://en.wikipedia.org/wiki/Pareto_principle">The Pareto principle, a.k.a. the 80/20 rule</a></li>
</ul>
<p>These things couple very well with additional practices that NUBIC embraces as part of its software development process, including:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Test-driven_development">Test Driven Development</a></li>
<li><a href="http://en.wikipedia.org/wiki/Behavior_Driven_Development">Behavior Driven Development</a></li>
<li><a href="http://en.wikipedia.org/wiki/Continuous_Integration">Continuous Integration</a></li>
<li><a href="https://github.com/nubic">Releasing much of our code as open source</a></li>
<li><a href="http://en.wikipedia.org/wiki/Kaizen">Kaizen &#8211; continuous improvement philosophy</a></li>
<li><a href="http://www.joelonsoftware.com/articles/fog0000000043.html">Doing well on the &#8220;Joel Test&#8221;</a></li>
</ul>
<p>Finally, <a href="http://www.thoughtworks-studios.com/">ThoughtWorks Studios</a> has a commercial product called <a href="http://www.thoughtworks-studios.com/go-agile-release-management">Go</a> for automated release management. A couple of people from ThoughtWorks also happen to be the authors of the book on Continuous Delivery.</p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/continuous-delivery-reading-and-resource-list/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Information leakage, and the many places our data goes</title>
		<link>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/information-leakage-and-the-many-places-our-data-goes/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=information-leakage-and-the-many-places-our-data-goes</link>
		<comments>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/information-leakage-and-the-many-places-our-data-goes/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 17:55:08 +0000</pubDate>
		<dc:creator>Jeff Lunt</dc:creator>
				<category><![CDATA[NUBIC Development]]></category>
		<category><![CDATA[anthony castillo]]></category>
		<category><![CDATA[authentication security]]></category>
		<category><![CDATA[leaky information]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[stackoverflow]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=730</guid>
		<description><![CDATA[Digital information is, without careful design, pretty leaky. It gets transferred all over the place, it gets logged, it gets backed up, and it often lives in those backups for a long time. This post discusses just one way that &#8230; <a href="http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/information-leakage-and-the-many-places-our-data-goes/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Digital information is, without careful design, pretty leaky. It gets transferred all over the place, it gets logged, it gets backed up, and it often lives in those backups for a long time.</p>
<p>This post discusses just one way that sensitive information is transparently taken out of your hands, and put into places you didn&#8217;t expect, or weren&#8217;t obvious from the use of a given technology.</p>
<p>My example is this: putting username/password combinations <strong>in a URL</strong>, as outlined in <a href="http://stackoverflow.com/questions/4980912/username-and-password-in-https-url">this question on StackOverflow</a> (you&#8217;ll probably want to read through this, for some background). While the implementation question posed is a simple one, and may make certain programming tasks simpler, it&#8217;s turned out to be a bad idea, because of the potential of that information leaking into unintended places.</p>
<p>As a follow up to that StackOverflow question, I received an email from <a href="http://twitter.com/#/alanthonyc">Anthony Castillo</a> (you can follow his blog at <a href="http://inquirious.com/">inquirious.com</a>), that delved deeper into this problem, specifically in an iOS context. The discussion is germane to anyone thinking about the unintended consequences of transferring data over networks:</p>
<blockquote><p>Hi Jeff,</p>
<p>[...]</p>
<p>I asked a question similar to one of yours on SO (<a href="http://stackoverflow.com/questions/4980912/username-and-password-in-https-url" target="_blank">regarding https://username:password@service.com type urls</a>).</p>
<p>I&#8217;m trying to decide if it is safe *enough* for me to use that format for an iOS project I&#8217;m working on. Just wondering if you went with it or if you ended up doing something else.</p>
<p>&nbsp;</p></blockquote>
<p>I replied&#8230;</p>
<blockquote><p>Hey Anthony, [...]</p>
<p>The short answer to your follow up is &#8220;no &#8211; it&#8217;s not considered good enough&#8221;. Security, to some extent, is a matter of managing risk. Username+password in HTTPS links potentially increases your risk with basically no reward. Any logs on systems between you and the destination system that capture that information (or even non-secure logs on your own machine), effectively makes using HTTPS useless for this purpose, since you wind up putting your credentials in the clear.</p>
<p>No bueno.</p>
<p>I originally wrote this question because I was trying to automate a connection between a script on my local machine, and my source repository, thereby preventing a password prompt form coming up at all, so the script could run unattended. The problem I was trying to solve was automating a web app deployment. However, I found that this wasn&#8217;t the best thing to do, for the following reasons:</p>
<ol>
<li>It&#8217;s better for security by far to use key-based authentication (rather than username+password), and connect over SSL for deployments and other automated processes, rather than HTTPS. The reasons for this are many, but the benefits are basically that automation can happen pretty easily after you get this setup, and because you&#8217;re not using passwords, the encryption is generally considered less likely to be broken, since your &#8220;secret key&#8221; is a high-security key, rather than a human-chosen password.</li>
<li>Having fully automated deployments was troublesome without <strong>also</strong> having an automatic rollback process. In my case, if a developer wasn&#8217;t involved in the deployment, that was a bigger problem than the inconvenience of having to be physically present to enter one&#8217;s password. [...]</li>
</ol>
<p>Back to your question, HTTPS connections are, as I understand them, tunnels directly between the client and the server, so theoretically security should be okay. However, you never know how many machines that request goes through before that tunnel is established (your ISP/wireless provider, for example, is one place that might get a hold of the URL <strong>with</strong> the username+password in it). Ideally, whatever code library you&#8217;re using to make the connection should drop the username+password from the URL while establishing the tunnel, however, there&#8217;s no guarantee, and it&#8217;s just better never to store your username+password in plain text <strong>anywhere</strong> that you don&#8217;t absolutely have to do so, especially since there are much better, and more secure, alternatives that are just as easy to setup. In any cases where you&#8217;ve determined that it is absolutely necessary to store credentials in plain text (I can&#8217;t think of any in the modern world in which we live), it&#8217;s critical that the file system on which it is stored is a system <strong>completely within your control, and not accessible via the Internet, </strong>lest someone get access to it. This includes not only the file system itself, but any locations/systems, onsite or off, to which that file system is backed up. In the real world, guaranteeing such a situation is difficult at best, and in practical terms, is probably closer to impossible.</p>
<p>As for your iOS app, that&#8217;s a bit trickier, depending on your situation, because you&#8217;re not talking about one key, and one server, you&#8217;re talking about a bunch of users, and a server they connect to. However, I would think that you could achieve key-based authentication for your users by giving them an API/application key that they could copy/paste into your iOS app that would essentially accomplish the same thing. If you need to go over HTTPS, you can pass the key in as a POST parameter to the session, and allow secure data transfer between your app and the server, in addition to the key acting as authentication (this user is who they say they are). I&#8217;m not an iOS expert, but one of my colleagues mentioned that iOS offers some encrypted storage on the device to store sensitive information such as credentials, and that it&#8217;s been available to iOS devices at least since iOS v4 (give or take). [..]</p></blockquote>
<p>Anthony replies&#8230;</p>
<blockquote><p>Hi Jeff, [...]</p>
<p>&nbsp;</p>
<p>After reading your email, I started digging around the web a bit more. Please let me know what you think of the following:</p>
<p><a href="http://en.wikipedia.org/wiki/Basic_access_authentication" target="_blank"><strong>Basic Access Authentication</strong></a> - this is the &#8220;<em><a href="mailto:username%3Apassword@service.com" target="_blank">username:password@service.com</a></em>&#8221; format that we have been discussing. This is as opposed to <a href="http://en.wikipedia.org/wiki/Digest_Access_Authentication" target="_blank">Digest Access Authentication</a>. (There may be others as well.)</p>
<ul>
<li>Basic authentication can be used with <em>either</em> HTTP or HTTPS.</li>
<li>With HTTP, it is definitely insecure.</li>
<li>When used with HTTPS, that implies SSL encryption over the whole connection.</li>
</ul>
<p>So&#8230;I <em>think</em> it&#8217;s okay to use basic access authentication, <strong>as long as</strong> it&#8217;s over HTTPS. (<a href="http://stackoverflow.com/questions/3464454/https-and-basic-authentication#3464462" target="_blank">1</a>) (<a href="http://en.wikipedia.org/wiki/HTTP_Secure" target="_blank">2</a>)</p>
<p>However, the point you make about not knowing how many machines a request goes through before reaching the server leaves me with slight doubts.</p>
<p>Perhaps we have slightly different choices here because you have control over both the client and server sides of the connection. Hence your ability to choose a protocol involving an API key.</p>
<p>As for me, I am building an app against a third-party web service. This means I have to go along with whatever protocol they have in place. In this case, it&#8217;s username/password authentication over HTTPS.</p>
<p>Maybe it <em>isn&#8217;t</em> completely secure, but it might still be the best I can do given my choices. (I think that&#8217;s my conclusion.) I don&#8217;t believe there is a way for me to force a different authentication method on the service to which I&#8217;m connecting.</p>
<p>Regarding encrypted storage on iOS, that is indeed a standard feature offered by Apple. They have what is known as a &#8220;Keychain&#8221; feature that allows you to store and encrypt data on an app by app basis. I basically let the OS handle the details of that for me. (Pretty nice.)</p></blockquote>
<p>And finally&#8230;</p>
<blockquote><p>Anthony,4</p>
<p>In your situation, it sounds like you&#8217;re right &#8211; if this is the only authentication method provided (and they don&#8217;t publish per-user API keys) you&#8217;re kind of up the creek on that one. You should still avoid putting the username/password in the URL itself if at all possible, depending on the limitations of the API requests. I would, if possible, contact the vendor and ask that they add it [key-based authentication]. In the mean time, I think you&#8217;re right that you&#8217;re stuck, according to what you&#8217;ve outlined.</p>
<p>The digest authentication looks like your best best, but isn&#8217;t considered as secure as key-based authentication (as noted in the wikipedia article you linked to), but it looks like you already know that. [...]</p></blockquote>
<p>So, it&#8217;s always a good idea to think about where your information is not only stored, but where it&#8217;s transferred to, and what that might mean in terms of that information getting out of your control inadvertently. Thanks to <a href="http://twitter.com/#/alanthonyc">Anthony Castillo</a> for the deeper discussion, and a specific example to which we could apply the principle.</p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/information-leakage-and-the-many-places-our-data-goes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why RTFM doesn&#8217;t work</title>
		<link>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/01/why-rtfm-doesnt-work/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=why-rtfm-doesnt-work</link>
		<comments>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/01/why-rtfm-doesnt-work/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 20:30:07 +0000</pubDate>
		<dc:creator>Jeff Lunt</dc:creator>
				<category><![CDATA[NUBIC Development]]></category>
		<category><![CDATA[5-whys]]></category>
		<category><![CDATA[rtfm]]></category>
		<category><![CDATA[step-by-step instructions]]></category>
		<category><![CDATA[technical support]]></category>
		<category><![CDATA[training issues]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=601</guid>
		<description><![CDATA[It&#8217;s not the users&#8217; fault. Honestly, it&#8217;s not. When answering a technical support question, have you ever asked someone, &#8220;Did you read the manual?&#8221; Well, put away your superiority complex for a moment, and realize that your users are wondering &#8230; <a href="http://informatics.northwestern.edu/blog/nubic-dev-2/2012/01/why-rtfm-doesnt-work/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<p>It&#8217;s <strong>not</strong> the users&#8217; fault. Honestly, it&#8217;s not.</p>
<p>When answering a technical support question, have you ever asked someone, &#8220;Did you read the manual?&#8221; Well, put away your superiority complex for a moment, and realize that your users are wondering why they need a manual in the first place.</p>
<p>Manuals stink, plain and simple, so stop using them whenever possible. If you&#8217;ve got a complex application, website (or really any training process whatsoever), and you feel that you aren&#8217;t receiving the respect you deserve for writing that 900-page, 100% comprehensive training manual, stop spending time on trying to improve the manual, and instead change the system.</p>
</div>
<div>Here are a few things you can try that are very simple, and very effective:</div>
<div></div>
<ul>
<li><strong>Ask why</strong> &#8211; first and foremost you need to understand why the user is having a problem with your application, then you need to correct the flaw that is causing the problem in the first place, thereby eliminating the need for the user to ask the question at all. A great method accomplishing this is <a title="5 Whys" href="http://en.wikipedia.org/wiki/5_Whys">5 Whys</a>. During the process of asking &#8220;why&#8221; it&#8217;s important to always be gracious about honest feedback, and curious about how people arrive at their state of confusion. Once you&#8217;ve figured out what&#8217;s at the root of the problem, it&#8217;s usually a trivial thing to change it.</li>
<li><strong>Show, don&#8217;t tell</strong> &#8211; create a short training video that shows people how to use it, rather than trying to explain it via text and pictures. If your training video can&#8217;t correctly explain it in less than three minutes, your app is either too complex, or your video is trying to do too much. Either fix your app, or sharpen the focus of your video. Great examples of awesome instructional videos are the videos that <a href="http://help.squarespace.com/customer/portal/articles/14410-squarespace-platform-overview-video-">introduce SquareSpace</a>. They are short, focused on a single topic each, and (in the case of SquareSpace) linked directly from the pages in which the related question might be raised in the user&#8217;s mind. A user is editing a webpage and wants to know how to add an image? The video for editing pages is linked from the page editing screen. Simple. It&#8217;s true that they still maintain a searchable collection of videos that any user can simply watch, but the fact of the matter is that pretty much no one is going to go through this library and watch all the videos <strong>first</strong>. Users will typically try something, and only when they fail, will they ask for help.</li>
<li><strong>Protect users from accidents</strong> - There are many times that users will do things that they don&#8217;t know are dangerous until it&#8217;s too late, and they can&#8217;t go back! Whenever possible, provide an &#8220;undo&#8221; function that allows users to fix mistakes with a simple click or keystroke. This method is often far superior then shifting all responsibility to the user, and presenting them with, &#8220;Are you sure?! You cannot undo this!&#8221; sorts of messages. Those messages make users fearful, cause them to stop and call you for help making a decision about what to do, and ultimately shift blame to the user when simply providing an &#8220;undo&#8221; function largely avoids the problem from happening the first place. Even the most seasoned users will occasionally make mistakes. These people aren&#8217;t &#8220;dumb,&#8221; and they&#8217;re just human after all. Do you really want to have to recover lost data, or blame them for the mistake, when your system could simply protect users from such accidents in the first place?</li>
<li><strong>Automate it</strong> &#8211; sometimes people make mistakes when doing repetitive tasks, because humans aren&#8217;t as good at doing highly repetitive things accurately 100% of the time, as compared to computers. This problem is exacerbated by processes that have multiple steps, where a mistake in any one of the steps can cause the whole process to break down. Try helping the users of your site or application by pre-filling in values for forms, automatically inserting reasonable default values, or better yet, just completely automate the process whenever possible. If there&#8217;s no reason that a human really needs to be involved in a process, take them out of the loop and save everyone some time and energy.</li>
<li><strong>Language is imprecise</strong> &#8211; step-by-step instructions, no matter how detailed and precise, no matter how carefully worded, are difficult to follow. Users gets lost in lengthy instructions, misunderstand or misinterpret technical terms, and people simply don&#8217;t want to read instructions anyway. Providing users with a glossary of terms (thinking that the manual should explain itself) isn&#8217;t really the answer either. So, use pictures instead of words when possible, and video instead of pictures when possible. The complications of interpreting language is part of why IKEA&#8217;s assembly instructions contain no words, only pictures.</li>
</ul>
<div>
<div id="attachment_705" class="wp-caption alignnone" style="width: 721px"><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/ikea_assembly_instructions.jpg"><img class="size-full wp-image-705" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/ikea_assembly_instructions.jpg" alt="" width="711" height="492" /></a><p class="wp-caption-text">The introductory page that explains how to avoid damaging your new furniture during assembly, and what to do if you need help or are confused. Pretty clear, yes? (1) put a carpet or rug under the pieces while assembling them, (2) if you&#039;re confused, look in the manual for a picture that shows what to do, and (3) call IKEA. Note that the last picture isn&#039;t a person on a phone calling IKEA - it&#039;s literally a handset connected to IKEA. When I see this, I think only two words: &quot;phone IKEA&quot;. The implication is uncomplicated, and clear. Also note this caption of those four pictures took an entire paragraph. Not very efficient, friendly, or helpful, is it?</p></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/01/why-rtfm-doesnt-work/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ETL Assistant – Getting Error Row Description and Column Information Dynamically</title>
		<link>http://informatics.northwestern.edu/blog/edw/2012/01/etl-assistant-getting-error-row-description-and-column-dynamically/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=etl-assistant-getting-error-row-description-and-column-dynamically</link>
		<comments>http://informatics.northwestern.edu/blog/edw/2012/01/etl-assistant-getting-error-row-description-and-column-dynamically/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 07:15:40 +0000</pubDate>
		<dc:creator>Eric Whitley</dc:creator>
				<category><![CDATA[EDW]]></category>
		<category><![CDATA[ETL Assistant]]></category>
		<category><![CDATA[SSIS]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=498</guid>
		<description><![CDATA[SSIS does a fine job at letting you manage "garden path" ETL, but many face the challenge of how to manage row failures. Which row failed? Why did it fail? Error row handling is a central part of any development task and usually winds up representing a significant chunk of your time and code. In this article we'll step you through how to overcome SSIS's design-time-only availability of error row information by creating a runtime dynamic error row handler using CozyRoc's tool kit for SSIS. <a href="http://informatics.northwestern.edu/blog/edw/2012/01/etl-assistant-getting-error-row-description-and-column-dynamically/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This article is going to attempt to provide one solution to the question of row error management in SSIS.  It&#8217;s one option, specially constructed for dynamic column mapping scenarios, but could probably be exploited for static situations as well.</p>
<h2>TLDR:</h2>
<ul>
<li>Download the sample package (<a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/dynamic_dft_error_handler_example.zip">Dynamic DFT Error Handler Sample Project</a>)</li>
<li>Run the SQL script</li>
<li>Open up the package and make sure your connections are set up appropriately</li>
</ul>
<h2>Management of Bad Rows in SSIS</h2>
<p>For ETL, SSIS does a fine job at letting you manage the basics of copying one column of data in some source table to another column of data in destination table.  Assuming all goes well, you wind up extracting/transforming/loading that data.</p>
<p>If things don&#8217;t go well, however&#8230;</p>
<p>Exception handling is a central part of any development task and usually winds up representing a significant chunk of your time and code. You wind up covering any number of &#8220;what ifs&#8221; like:</p>
<ul>
<li>What if I failed to connect to a system?</li>
<li>What if I expected data and didn&#8217;t get any?</li>
<li>What if my expected data type overflowed?</li>
<li>What if something totally unanticipated happened?</li>
</ul>
<p><a style="font-style: normal; line-height: 24px; text-decoration: underline;" href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_1_vanilla.png"><img class="size-full wp-image-649 alignright" style="border-style: initial; border-color: initial; background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: #eeeeee;" title="etl_asst_error_log_1_vanilla" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_1_vanilla.png" alt="ETL Assistant Error Logger - Basic SSIS DFT Without Error Handling" width="202" height="185" /></a>If you&#8217;ve used SSIS for ETL you&#8217;re accustomed to the idea of data flow paths inside of a transformation.  You connect a source component to a destination component via either a green line (&#8220;good output&#8221;) or a red line (&#8220;bad / error output&#8221;).  This is great stuff.  Say you query some rows from a source database table and want to send the rows to a destination database table &#8211; you simply wire up the green line from the source to the destination and map the columns.  Done.  Walk away.</p>
<p>But what about the implied red line for bad rows?  What if you actually have an issue with the transformation?  Two immediate reasons come to mind:</p>
<ul>
<li>The data was truncated in some way (cast my Oracle number(20,0) to a SQL int)</li>
<li>Some other unanticipated error occurred (for the sake of explanation, let&#8217;s say a primary key violation on insert)</li>
</ul>
<p><a style="font-style: normal; line-height: 24px; text-decoration: underline; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;" href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_2_error_redirection.png"><img class="size-full wp-image-651 alignright" style="border-style: initial; border-color: initial; margin-top: 0.4em; background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: #eeeeee;" title="etl_asst_error_log_2_error_redirection" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_2_error_redirection.png" alt="ETL Assistant - SSIS DFT Error Row Redirection" width="314" height="267" /></a></p>
<p>Usually what you&#8217;d do with a static transformation is simply use row redirection to handle the exception.  A common solution is to log your error information to a shared error log table for later review.  By attaching the appropriate error output to your destination you &#8220;channel&#8221; the row information to that destination so you have a hope of figuring out what happened and what you can do about it.</p>
<p>SSIS usually works really well for these situations, with the exception of two nagging challenges you&#8217;ll see come up a <em>lot</em> in discussion forums:</p>
<ul>
<li>&#8220;My row failed &#8211; how do I get the error description?&#8221;</li>
<li>&#8220;My row failed &#8211; how do I tell which row failed?&#8221;</li>
</ul>
<p>Error description is fairly straight forward and I&#8217;m not going to get into it too much &#8211; there&#8217;s a great step-by-step example at (<a href="http://consultingblogs.emc.com/jamiethomson/archive/2005/08/08/1969.aspx">http://consultingblogs.emc.com/jamiethomson/archive/2005/08/08/1969.aspx</a>) which is very instructive.</p>
<p>Error row identifier, though, is a bit more complex because of the way SSIS works.</p>
<h2>Error Columns and Lineage IDs</h2>
<div style="border: 1px solid yellow; background-color: #ffffcc; padding: 5px;">I&#8217;m going to preface this next section by noting that I don&#8217;t have a super clear picture on the internals of how SSIS column flow works, but I get a sense of it.  Please feel free to comment / email me and I&#8217;ll update anything that needs correcting.</div>
<p>&nbsp;</p>
<p>Let&#8217;s say you have a row with an integer column &#8220;employee_id&#8221; which is the primary key on a table.  What you see is a single presentation of that column &#8220;employee_id&#8221; &#8211; it&#8217;s labeled that way throughout your data transformation flow, so to you it&#8217;s &#8220;the same&#8221; throughout the flow.  What SSIS sees internally, however, is something completely different.  If you dig a bit you&#8217;ll find you have a <em>unique</em> representation of this column at each point throughout the flow of your SSIS package.  That single &#8220;column&#8221; (&#8220;employee_id&#8221;) has to be treated uniquely at each input, output, and error output for each step.  Beyond needing to understand how to treat flow direction (ex: input column vs output column), the column itself may change data types, names, or even value as it flows through your package.  SSIS needs to keep track of that &#8220;column&#8221; at each point throughout the flow and treat it as though it&#8217;s unique.  So how does it do that?  LineageID.</p>
<p>There&#8217;s a great article on SQL Server Central (<a href="http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/65730/" target="_blank">http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/65730/</a> ) that touches on some of this.  The article describes lineageid as</p>
<blockquote><p>It’s an integer and it’s unique throughout the data flow. When buffers are reused, the Lineage ID doesn’t change – it’s the same column at the input and output. When buffers are copied, a new column is created – which gets a new (unique) Lineage ID.</p></blockquote>
<p>That means that as the column &#8220;employee_id&#8221; flows through the DFT, it gets a unique Lineage ID -<em> for each input and output copy of itself</em>.  And, typically, you have&#8230;</p>
<ul>
<li>An input column</li>
<li>An output column for &#8220;good&#8221; data</li>
<li>An output column for errors</li>
</ul>
<p>Taking the &#8220;employee_id&#8221; example from the &#8220;OLE DB Source&#8221; step in our DFT we&#8217;d have:</p>
<ul>
<li>Input (ID = 33)</li>
<li>Source Error Output (Lineage ID 35)</li>
<li>Good Output (Lineage ID = 34)</li>
</ul>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_6_lineage_id_11.png"><img class="alignnone size-full wp-image-663" title="etl_asst_error_log_6_lineage_id_1" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_6_lineage_id_11.png" alt="ETL Assistant - SSIS DFT Lineage ID Flow" width="674" height="339" /></a></p>
<p>Great!  No problem.  As long as we know the LineageIDs related to our steps we can back track to determine the mapping to &#8220;column name&#8221; and voila &#8211; we know which row failed.  We can simply look up the column by LineageID using &#8220;FindColumnByLineageID&#8221; in a script task (<a href="http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.dts.pipeline.wrapper.idtsbuffermanager100.findcolumnbylineageid.aspx">http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.dts.pipeline.wrapper.idtsbuffermanager100.findcolumnbylineageid.aspx</a>).  Magic.</p>
<p>Not so fast.  One small, but critical catch.  Metadata about a task step is only available within the scope of that task step. Meaning &#8211; once we get past &#8220;OLE DB Source&#8221; I can see &#8220;Lineage ID,&#8221; but I can&#8217;t easily track back to determine the <em>mapping</em> of Lineage ID to column name.  So &#8211; if you want to write out error row information (specifically &#8220;column name&#8221;) in a second DFT (to your error log, for example) there&#8217;s no way to look up that name &#8211; because the metadata about LineageID is no longer in scope &#8211; it&#8217;s only available to the <em>prior</em> step.  Incredibly frustrating.</p>
<h2>Getting Error Column Information With Static DFTs</h2>
<p>For static packages this can be addressed a few ways. The general strategy is to map the Lineage IDs / IDs to column information at <em>design time</em> and then use that information to look up the information you need.</p>
<p>Couple of quick links you may find handy.</p>
<ul>
<li>How to Find Out Which Column Caused SSIS to Fail? (<a href="http://blogs.msdn.com/b/helloworld/archive/2008/08/01/how-to-find-out-which-column-caused-ssis-to-fail.aspx">http://blogs.msdn.com/b/helloworld/archive/2008/08/01/how-to-find-out-which-column-caused-ssis-to-fail.aspx</a>)</li>
<li>Error Output&#8217;s Description (Component on CodePlex) (<a href="http://eod.codeplex.com/">http://eod.codeplex.com/</a> )</li>
<li>eLog (<a href="http://ssisctc.codeplex.com/wikipage?title=eLog&amp;referringTitle=Home">http://ssisctc.codeplex.com/wikipage?title=eLog&amp;referringTitle=Home</a>)</li>
</ul>
<p>Again &#8211; for static packages, these can mostly if not completely solve the issue and leave you in a far better position to determine which rows failed.  I&#8217;m not going to go into these since you can read up online.</p>
<h2>So What About a Dynamic DFT?</h2>
<p>Note that the links I provided above address <em>design time</em> gathering / mapping of column information.   What do you do about a <em>runtime</em> situation?  We started digging into the CozyRoc dynamic DFT about a year ago.  Basic dynamic mappings worked <em>great</em>.  You can easily remap columns at runtime and, assuming all goes well, you&#8217;re done.  But if things don&#8217;t go well &#8211; what then?</p>
<p>We need to catch and log those bad rows.  But &#8211; we can&#8217;t map columns / Lineage ID information at design time because that negates the entire point of using a dynamic DFT &#8211; you won&#8217;t know <em>any</em> of the required information. It&#8217;s just not there.  Now that issue with the resolution of metadata from prior steps comes into play.  We can&#8217;t generate column information at design time and we can&#8217;t inspect metadata from ancestor steps within a DFT.  They&#8217;re out of scope.</p>
<p>I&#8217;ll admit that when I first looked at this I was stumped.  And incredibly frustrated.  There was this great opportunity to really let SSIS <strong><em>rock</em></strong> using CozyRoc&#8217;s dynamic DFT, but the inability to handle bad rows in a data warehousing solution is a showstopper (keep in mind the issue here is an <em>SSIS design constraint</em>, <strong><em>not</em></strong> a CozyRoc fault).  Following the examples for handling static mappings online (thank you very much, above-linked article authors), we had the notion that we should be able to pull some of the DFT information out at runtime and approach the problem somewhat similarly.</p>
<ul>
<li>Upon startup, obtain a list of all columns, their IDs, and their Lineage IDs</li>
<li>Store that list in a collection</li>
<li>Using the IDs / Lineage IDs from the errors to look up the corresponding record in our collection</li>
<li>Profit</li>
</ul>
<p>I rang up CozyRoc and discussed the situation with their engineers.  They immediately understood my intentions and mailed me back a quick sample of some code that exploited a fantastic capability of their dynamic DFT &#8211; the ability to <em>add script to the DFT itself</em>. (Thanks, CozyRoc!)  Not code via a script task <em>within</em> the DFT, but on the DFT directly.</p>
<p>CozyRoc DFT+ (<a href="http://www.cozyroc.com/ssis/data-flow-task">http://www.cozyroc.com/ssis/data-flow-task</a>) notes that you can apply script on the DFT by accessing&#8230;</p>
<ul>
<li><strong>Advanced</strong> tab &#8211; specifies advanced task options.</li>
<li><strong>Script</strong> page &#8211; specifies data flow task script, which is used for <strong>Setup</strong> tab customization.</li>
</ul>
<p>Aha.  And the magic snippet they supplied me&#8230;</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">public <span style="color: #993333;">void</span> OnColumnAdded<span style="color: #009900;">&#40;</span>IDTSComponentMetaData100 component<span style="color: #339933;">,</span> bool isInput<span style="color: #339933;">,</span> string colName<span style="color: #009900;">&#41;</span>
<span style="color: #666666; font-style: italic;">//do stuff</span></pre></div></div>

<p>Great!  They provided event hooks for the dynamic column mapping!  So now I can detect when a column is added to the DFT flow, add it to my reference collection of column information, and then access that collection within the DFT to derive column information critical to error logging.</p>
<p>This will let me take &#8220;Lineage ID&#8221; 12345 at <em>any</em> point throughout the flow and figure out that it was column &#8220;employee_name_concat&#8221; or whatever and log that.  We&#8217;re in business.</p>
<p>Something to note here.  Handling row truncation behavior is trickier when you&#8217;re doing this dynamically.  You can now longer manually address the need to &#8220;redirect on truncation&#8221; on a column by column basis, so you just extend the magic DFT+ column binding event to do it for you.</p>
<p>&nbsp;</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span>isInput<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
      IDTSOutputColumn100 column <span style="color: #339933;">=</span> component.<span style="color: #202020;">OutputCollection</span><span style="color: #009900;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">OutputColumnCollection</span><span style="color: #009900;">&#91;</span>colName<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
      column.<span style="color: #202020;">TruncationRowDisposition</span> <span style="color: #339933;">=</span> DTSRowDisposition.<span style="color: #202020;">RD_RedirectRow</span><span style="color: #339933;">;</span>
      column.<span style="color: #202020;">ErrorRowDisposition</span> <span style="color: #339933;">=</span> DTSRowDisposition.<span style="color: #202020;">RD_RedirectRow</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Done. Setting row disposition behavior accomplished.</p>
<p>From there we wrote up the nastier parts of the whole exercise &#8211; the entire collection lookup mechanism to derive column information.  We did that as a script task within the body of the DFT.</p>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_7_dft_script.png"><img class="alignnone size-full wp-image-671" title="etl_asst_error_log_7_dft_script" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_7_dft_script.png" alt="ETL Assistant - SSIS DFT Error Script Task" width="344" height="556" /></a></p>
<p>The script task pulls rows out of the buffer and evaluates row position to resolve the Lineage ID / ID and determine&#8230;</p>
<ul>
<li>Source column name (EX: &#8220;first_name&#8221;)</li>
<li>Source primary key name (EX: &#8220;employee_id&#8221;)</li>
<li>Source primary key value (EX: &#8220;12345&#8243;)</li>
<li>Error description (using ComponentMetaData.GetErrorDescription)</li>
<li>Error data (so we can quickly eyeball the offending column)</li>
</ul>
<p>You&#8217;ll note I said &#8220;primary key name&#8221; &#8211; we felt it was &#8220;good enough&#8221; for the moment to avoid dealing with compound keys.  That&#8217;s definitely a shortcoming, but for the time being we felt that was acceptable since it matched our existing static ETL error handling process.  It&#8217;s definitely something that needs to be addressed, though.  We also cheat by explicitly passing in the primary key as an element of the process (we derive it at an earlier step) &#8211; again, in consulting speak, an &#8220;opportunity for improvement.&#8221;</p>
<h2>Putting it All Together</h2>
<p>Now that we&#8217;ve touched on the ideas, let&#8217;s see it work.  Rather than walk you through the entire step-by-step process of building a package I&#8217;m going to suggest you <a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/dynamic_dft_error_handler_example.zip">Dynamic DFT Error Handler Sample Project</a>.  I&#8217;ll quickly touch on the major points on how the sample works.</p>
<p>The download includes some SQL scripts to set up&#8230;</p>
<ul>
<li>[<strong>etl_proxy_table</strong>, <strong>etl_proxy_table_stg</strong>, <strong>etl_proxy_table_src]</strong> We use some fake placeholder &#8220;proxy&#8221; tables  so you can set up data bindings in the DFT+.  CozyRoc also suggests you use THUNK_COLUMNs to do this, but I&#8217;ve found using these placeholder tables to be very helpful.  The reason we use these is that the magic OnColumnAdded method<em> only fires when a column is actually added</em> to the DFT. If you statically map any of the columns the entire error handling approach will fail because we won&#8217;t have those &#8220;static&#8221; columns added to our column collection.  Huge thank-you to CozyRoc for clueing me in on that.</li>
<li>[<strong>etl_errors</strong>] our error logging table. YMMV, but remember if you change this you also need to adjust the scripts in the DFT.</li>
<li>[<strong>demo_source_table, demo_dest_table</strong>] our source and destination tables.  We&#8217;re big Simpsons fans over here, so I&#8217;ve provided appropriate sample data.</li>
</ul>
<p>The overall package has a few steps:</p>
<ul>
<li><strong>["Set Table Information"]</strong> - A cheater <strong>Script Task</strong> to mimic pulling table configuration information.  In a production scenario you&#8217;d likely want to provide configuration elements from either a config file or, better yet, a configuration table.</li>
<li><strong>["SQL Get First table_keycol name"]</strong> - An <strong>Execute SQL</strong> task which we&#8217;ll use to pull out primary key information from our destination table.  This just uses INFORMATION_SCHEMA to look up your target table and pull back the first column for the primary key.  If you use unique constraints or something else, just tweak the SQL or overwrite the destination variable.</li>
<li><strong>["Truncate Destination Table"]</strong> - A second <strong>Execute SQL</strong> task to truncate our destination table (for a full load)</li>
<li><strong>["Data Flow Task Plus"]</strong> - A<strong> CozyRoc DFT+</strong> task for our dynamic loading.  The brains of the operation.</li>
</ul>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_5_process_flowl.png"><img class="alignnone size-full wp-image-682" title="etl_asst_error_log_5_process_flowl" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_5_process_flowl.png" alt="ETL Assistant - Dynamic Error Handling - Overall Package Step Flow" width="710" height="595" /></a></p>
<p>We also have variables.  In our production deployment we have lots and lots of variables.</p>
<p>The major points here are:</p>
<ul>
<li><strong>table_colmap</strong> is a System.Object that is our collection of column names, IDs, and Lineage IDs for all columns in our DFT.  I scoped this to our DFT+ task because it&#8217;s specific to that task, but you could get away with scoping it to the package.</li>
<li>Everything else.  We&#8217;re more or less mimicking the variables we used in previous articles.</li>
</ul>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_3_variables.png"><img title="etl_asst_error_log_3_variables" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_3_variables.png" alt="ETL Assistant - Dynamic Error - Variables" width="654" height="381" /></a></p>
<p>Let&#8217;s move on to the DFT.  Open up the DFT+.  You&#8217;re going to see two main paths:</p>
<ul>
<li>We had an issue obtaining the source data.  (right side) Yes.  This does happen.  Case in point &#8211; you have a date of &#8220;-4444 AD&#8221; in Oracle.  The OLEDB driver we use for Oracle really doesn&#8217;t like that.  Or even a 44 digit numeric.</li>
<li>We had an issue writing to the destination table. (left side)</li>
</ul>
<div>In both paths we simply channel the error rows to our error handler script task to process the buffer and do its magic.  I cheat by seeding the flow with additional error columns we overwrite within the task.  Mainly because I&#8217;m too lazy to magically add columns to the buffer myself from within the script task.</div>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_4_error_flow_final.png"><img class="alignnone size-full wp-image-685" title="etl_asst_error_log_4_error_flow_final" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_4_error_flow_final.png" alt="ETL Assistant - Dynamic Error Flow DFT+" width="492" height="652" /></a></p>
<p>Let&#8217;s give it a whirl and see what happens.</p>
<p>I&#8217;ve intentionally created opportunities for problems.</p>
<table width="564" border="0" cellspacing="0" cellpadding="0">
<colgroup>
<col width="118" />
<col width="30" />
<col width="111" />
<col width="77" />
<col width="33" />
<col width="117" />
<col width="78" /> </colgroup>
<tbody>
<tr>
<td width="118" height="21"><strong>Column</strong></td>
<td width="30"></td>
<td width="111"><strong>Source</strong></td>
<td width="77"></td>
<td width="33"></td>
<td width="117"><strong>Destination</strong></td>
<td width="78"></td>
</tr>
<tr>
<td height="17"><strong>column_name</strong></td>
<td></td>
<td><strong>DATA_TYPE</strong></td>
<td><strong>MAX_LEN</strong></td>
<td></td>
<td><strong>DATA_TYPE</strong></td>
<td><strong>MAX_LEN</strong></td>
</tr>
<tr>
<td height="17">employee_id</td>
<td></td>
<td>int</td>
<td>NULL</td>
<td></td>
<td>int</td>
<td>NULL</td>
</tr>
<tr>
<td height="17">employee_guid</td>
<td></td>
<td>uniqueidentifier</td>
<td>NULL</td>
<td></td>
<td>uniqueidentifier</td>
<td>NULL</td>
</tr>
<tr>
<td height="17">email_addr</td>
<td></td>
<td>varchar</td>
<td align="right">20</td>
<td></td>
<td>varchar</td>
<td align="right"><span style="color: #ff0000;">15</span></td>
</tr>
<tr>
<td height="17">first_nm</td>
<td></td>
<td>varchar</td>
<td align="right">20</td>
<td></td>
<td>varchar</td>
<td align="right"><span style="color: #ff0000;">10</span></td>
</tr>
<tr>
<td height="17">last_nm</td>
<td></td>
<td>varchar</td>
<td align="right">20</td>
<td></td>
<td>varchar</td>
<td align="right"><span style="color: #ff0000;">10</span></td>
</tr>
<tr>
<td height="17">awesomeness</td>
<td></td>
<td>bigint</td>
<td>NULL</td>
<td></td>
<td><span style="color: #ff0000;">int</span></td>
<td>NULL</td>
</tr>
<tr>
<td height="17">create_dts</td>
<td></td>
<td>datetime</td>
<td>NULL</td>
<td></td>
<td>datetime</td>
<td>NULL</td>
</tr>
<tr>
<td height="17">modified_dts</td>
<td></td>
<td>datetime</td>
<td>NULL</td>
<td></td>
<td>datetime</td>
<td>NULL</td>
</tr>
</tbody>
</table>
<p>The destination columns will have conversion issues with</p>
<ul>
<li>email_addr length</li>
<li>first_nm length</li>
<li>last_nm length</li>
<li>awesomeness (rating) size</li>
</ul>
<table width="567" border="0" cellspacing="0" cellpadding="0">
<colgroup>
<col width="81" />
<col width="161" />
<col width="107" />
<col width="101" />
<col width="117" /> </colgroup>
<tbody>
<tr>
<td width="81" height="17"><strong>employee_id</strong></td>
<td width="161"><strong>email_addr</strong></td>
<td width="107"><strong>first_nm</strong></td>
<td width="101"><strong>last_nm</strong></td>
<td width="117"><strong>awesomeness</strong></td>
</tr>
<tr>
<td align="right" height="17">1</td>
<td>jjones@test.org</td>
<td>Jimbo</td>
<td>Jones</td>
<td align="right">25</td>
</tr>
<tr>
<td align="right" height="17">2</td>
<td><span style="color: #ff0000;">captain@test.org</span></td>
<td>Horatio</td>
<td>McCallister</td>
<td align="right">100000</td>
</tr>
<tr>
<td align="right" height="17">3</td>
<td>homer@test.org</td>
<td>Homer</td>
<td>Simpson</td>
<td align="right">25000</td>
</tr>
<tr>
<td align="right" height="17">4</td>
<td>marge@test.org</td>
<td>Marjorie</td>
<td>Simpson</td>
<td align="right"><span style="color: #ff0000;">250000000000</span></td>
</tr>
<tr>
<td align="right" height="17">5</td>
<td><span style="color: #ff0000;">cruiser@test.org</span></td>
<td>Waylon</td>
<td>Smithers</td>
<td align="right">100</td>
</tr>
<tr>
<td align="right" height="17">6</td>
<td>bart@test.org</td>
<td><span style="color: #ff0000;">Bartholomew</span></td>
<td>Simpson</td>
<td align="right">25</td>
</tr>
<tr>
<td align="right" height="17">7</td>
<td><span style="color: #ff0000;">lisasimpson@test.org</span></td>
<td>Lisa</td>
<td>Simpson</td>
<td align="right">25</td>
</tr>
</tbody>
</table>
<p>If we run the package and review our error log we&#8217;ll see failures related to the highlighted columns.  (Note that I&#8217;ve removed some elements of the exception log here solely for formatting)</p>
<p>&nbsp;</p>
<table border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="49" height="17">error_id</td>
<td width="47">record_id</td>
<td width="59">record_id_dsc</td>
<td width="73">column_nm</td>
<td width="82">error_id</td>
<td width="515">error_dsc</td>
<td width="161">error_data</td>
</tr>
<tr>
<td align="right" height="34">8</td>
<td align="right">2</td>
<td>employee_id</td>
<td>email_addr</td>
<td align="right">-1071607689</td>
<td width="315">The data value cannot be converted for reasons other than sign mismatch or data overflow.</td>
<td>captain@test.org</td>
</tr>
<tr>
<td align="right" height="34">9</td>
<td align="right">4</td>
<td>employee_id</td>
<td>awesomeness</td>
<td align="right">-1071607686</td>
<td width="315">Conversion failed because the data value overflowed the type used by the provider.</td>
<td align="right">250000000000</td>
</tr>
<tr>
<td align="right" height="34">10</td>
<td align="right">5</td>
<td>employee_id</td>
<td>email_addr</td>
<td align="right">-1071607689</td>
<td width="315">The data value cannot be converted for reasons other than sign mismatch or data overflow.</td>
<td>cruiser@test.org</td>
</tr>
<tr>
<td align="right" height="34">11</td>
<td align="right">6</td>
<td>employee_id</td>
<td>first_nm</td>
<td align="right">-1071607689</td>
<td width="315">The data value cannot be converted for reasons other than sign mismatch or data overflow.</td>
<td>Bartholomew</td>
</tr>
<tr>
<td align="right" height="34">12</td>
<td align="right">7</td>
<td>employee_id</td>
<td>email_addr</td>
<td align="right">-1071607689</td>
<td width="315">The data value cannot be converted for reasons other than sign mismatch or data overflow.</td>
<td>lisasimpson@test.org</td>
</tr>
</tbody>
</table>
<p>&#8220;You&#8217;re failing, Seymour! What is it about you and failure?&#8221;</p>
<p>There you go &#8211; row exceptions being logged for various issues with data from the dynamic DFT.</p>
<h2>How Denali Should Fix This</h2>
<p>We&#8217;re eagerly anticipating Denali for several reasons, but one fantastic piece of news is that SSIS in Denali should let us bypass most if not all of the issues with LinageID.  As Jorg Klein notes in one of his blog posts (<a href="http://sqlblog.com/blogs/jorg_klein/archive/2011/07/22/ssis-denali-ctp3-what-s-new.aspx">http://sqlblog.com/blogs/jorg_klein/archive/2011/07/22/ssis-denali-ctp3-what-s-new.aspx</a>):</p>
<blockquote><p>SSIS always mapped columns from source to transformations or destinations with the help of lineage ids. Every column had a unique metadata ID that was known by all components in the data flow. If something changed in the source this would break the lineage ids and raised error messages like: The external metadata column collection is out of synchronization with the data source columns.<br />
To fix this error you would re-map all broken lineage ids with the “Restore Invalid Column References Editor”.<br />
In Denali lineage-ids are no longer used. Mappings are done on column names, which is great because you can now use auto map on column names and even copy/paste pieces of another data flow and connect them by mapping the corresponding column names.</p></blockquote>
<p>Fan.  Tastic.  Couldn&#8217;t come soon enough.  Granted, you&#8217;ll have to upgrade to Denali to make use of this, but there are so many other compelling reasons to migrate (<a href="http://www.brentozar.com/sql/sql-server-denali-2011-2012/">http://www.brentozar.com/sql/sql-server-denali-2011-2012/</a>) that this is just icing on the cake.</p>
<p>&nbsp;</p>
<h1>Appendix &#8211; Code</h1>
<p>This code is provided in the download, but for quick access / reference I&#8217;m also including it here.</p>
<h2>DFT+ Column Collection Script</h2>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">using System<span style="color: #339933;">;</span>
using System.<span style="color: #202020;">Data</span><span style="color: #339933;">;</span>
using Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Runtime</span><span style="color: #339933;">;</span>
using System.<span style="color: #202020;">Windows</span>.<span style="color: #202020;">Forms</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">/*
&nbsp;
 Add references to ...
     CozyRoc.SSISPlus.2008
     Microsoft.SqlServer.DTSPipelineWrap
     Microsoft.SQLServer.DTSRuntimeWrap 
&nbsp;
 */</span>
&nbsp;
using Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Pipeline</span>.<span style="color: #202020;">Wrapper</span><span style="color: #339933;">;</span>
using Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Runtime</span>.<span style="color: #202020;">Wrapper</span><span style="color: #339933;">;</span>
using CozyRoc.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">SSIS</span>.<span style="color: #202020;">Attributes</span><span style="color: #339933;">;</span>
using CozyRoc.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">SSIS</span><span style="color: #339933;">;</span>
&nbsp;
using System.<span style="color: #202020;">Collections</span><span style="color: #339933;">;</span>
using System.<span style="color: #202020;">Collections</span>.<span style="color: #202020;">Generic</span><span style="color: #339933;">;</span>
&nbsp;
namespace ST_44af5cee356540e294c47d0aa17d41ed.<span style="color: #202020;">csproj</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #009900;">&#91;</span>System.<span style="color: #202020;">AddIn</span>.<span style="color: #202020;">AddIn</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;ScriptMain&quot;</span><span style="color: #339933;">,</span> Version <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;1.0&quot;</span><span style="color: #339933;">,</span> Publisher <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;&quot;</span><span style="color: #339933;">,</span> Description <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span>
    <span style="color: #009900;">&#91;</span>DataFlowColumnAdded<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;OnColumnAdded&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span><span style="color: #666666; font-style: italic;">//CozyRoc annotation</span>
    public partial class ScriptMain <span style="color: #339933;">:</span> Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Tasks</span>.<span style="color: #202020;">ScriptTask</span>.<span style="color: #202020;">VSTARTScriptObjectModelBase</span>
    <span style="color: #009900;">&#123;</span>
&nbsp;
        <span style="color: #339933;">#region VSTA generated code</span>
        <span style="color: #000000; font-weight: bold;">enum</span> ScriptResults
        <span style="color: #009900;">&#123;</span>
            Success <span style="color: #339933;">=</span> Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Runtime</span>.<span style="color: #202020;">DTSExecResult</span>.<span style="color: #202020;">Success</span><span style="color: #339933;">,</span>
            Failure <span style="color: #339933;">=</span> Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Runtime</span>.<span style="color: #202020;">DTSExecResult</span>.<span style="color: #202020;">Failure</span>
        <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
        <span style="color: #339933;">#endregion</span>
&nbsp;
        public <span style="color: #993333;">void</span> OnColumnAdded<span style="color: #009900;">&#40;</span>IDTSComponentMetaData100 component<span style="color: #339933;">,</span> bool isInput<span style="color: #339933;">,</span> string colName<span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
&nbsp;
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span>isInput<span style="color: #009900;">&#41;</span>
                <span style="color: #009900;">&#123;</span>
                    IDTSOutputColumn100 column <span style="color: #339933;">=</span> component.<span style="color: #202020;">OutputCollection</span><span style="color: #009900;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">OutputColumnCollection</span><span style="color: #009900;">&#91;</span>colName<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
                    column.<span style="color: #202020;">TruncationRowDisposition</span> <span style="color: #339933;">=</span> DTSRowDisposition.<span style="color: #202020;">RD_RedirectRow</span><span style="color: #339933;">;</span>
                    column.<span style="color: #202020;">ErrorRowDisposition</span> <span style="color: #339933;">=</span> DTSRowDisposition.<span style="color: #202020;">RD_RedirectRow</span><span style="color: #339933;">;</span>
                <span style="color: #009900;">&#125;</span>
                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>isInput<span style="color: #009900;">&#41;</span>
                <span style="color: #009900;">&#123;</span>
&nbsp;
                    IDTSInputColumn100 column <span style="color: #339933;">=</span> component.<span style="color: #202020;">InputCollection</span><span style="color: #009900;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">InputColumnCollection</span><span style="color: #009900;">&#91;</span>colName<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
                    Dictionary colmap <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                    Variables variables <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">null</span><span style="color: #339933;">;</span>
&nbsp;
                    try
                    <span style="color: #009900;">&#123;</span>
                        Dts.<span style="color: #202020;">VariableDispenser</span>.<span style="color: #202020;">LockOneForWrite</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;User::table_colmap&quot;</span><span style="color: #339933;">,</span> ref variables<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>variables<span style="color: #009900;">&#91;</span><span style="color: #ff0000;">&quot;User::table_colmap&quot;</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">Value</span>.<span style="color: #202020;">GetType</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> colmap.<span style="color: #202020;">GetType</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                        <span style="color: #009900;">&#123;</span>
                            colmap <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>Dictionary<span style="color: #009900;">&#41;</span>variables<span style="color: #009900;">&#91;</span><span style="color: #ff0000;">&quot;User::table_colmap&quot;</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">Value</span><span style="color: #339933;">;</span>
                        <span style="color: #009900;">&#125;</span>
                        <span style="color: #b1b100;">else</span>
                        <span style="color: #009900;">&#123;</span>
                        <span style="color: #009900;">&#125;</span>
                        colmap.<span style="color: #202020;">Add</span><span style="color: #009900;">&#40;</span>column.<span style="color: #202020;">ID</span><span style="color: #339933;">,</span> column.<span style="color: #202020;">Name</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                        variables<span style="color: #009900;">&#91;</span><span style="color: #ff0000;">&quot;User::table_colmap&quot;</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">Value</span> <span style="color: #339933;">=</span> colmap<span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">//put the column collection back into the variable</span>
                    <span style="color: #009900;">&#125;</span>
                    catch <span style="color: #009900;">&#40;</span>Exception exi<span style="color: #009900;">&#41;</span>
                    <span style="color: #009900;">&#123;</span>
                    <span style="color: #009900;">&#125;</span>
                    finally
                    <span style="color: #009900;">&#123;</span>
                        variables.<span style="color: #202020;">Unlock</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                    <span style="color: #009900;">&#125;</span>
                <span style="color: #009900;">&#125;</span>
            <span style="color: #009900;">&#125;</span>
            catch
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
        <span style="color: #009900;">&#125;</span>
&nbsp;
        public <span style="color: #993333;">void</span> Main<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
&nbsp;
            Dts.<span style="color: #202020;">TaskResult</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span><span style="color: #009900;">&#41;</span>ScriptResults.<span style="color: #202020;">Success</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<h2>Error Row Handler Script</h2>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">/* Microsoft SQL Server Integration Services Script Component
*  This is CozyRoc Script Component Plus Extended Script
*  Write scripts using Microsoft Visual C# 2008.
*  ScriptMain is the entry point class of the script.*/</span>
&nbsp;
using System<span style="color: #339933;">;</span>
using System.<span style="color: #202020;">Text</span><span style="color: #339933;">;</span>
&nbsp;
using Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Pipeline</span>.<span style="color: #202020;">Wrapper</span><span style="color: #339933;">;</span>
using Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Runtime</span>.<span style="color: #202020;">Wrapper</span><span style="color: #339933;">;</span>
&nbsp;
using System.<span style="color: #202020;">Collections</span><span style="color: #339933;">;</span>
using System.<span style="color: #202020;">Collections</span>.<span style="color: #202020;">Generic</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//for our dictionaries / lists</span>
&nbsp;
<span style="color: #808080; font-style: italic;">/*
 * HOW IT WORKS:
 * ======================================================================
 * Thanks to CozyRoc's great sample code (thanks, CozyRoc! :), we're able to rip through the
 * set of columns and find the error info critical for our logging / fixing.  We get the basic
 * column info on PreExecute() and store the column names, column lineage IDs, and column relative
 * position (&quot;index&quot;) in two separate dictionaries for later.  We use LineageID as the key for those
 * and the later on during Input_ProcessInputRow to look up those names and IDs so we can pull back
 * data from the buffer and then also UPDATE the buffer to overwrite our custom error info columns
 *
 * Dictionary 1: Set of column names (&quot;colnames&quot;) key: LineageID, value: column.Name
 * Dictionary 2: Set of column relative positions (&quot;colids&quot;) key: LineageID, value colIndex
 *
 * PreExecute - set up objects for later.  IDs for columns, dictionaries, variables, etc.
 * Input_ProcessInputRow - the &quot;real work&quot; of adjusting / setting the values in the columns
 *
 * SETUP - READ THIS OR IT WON'T WORK
 * ======================================================================
 * REQUIRED INPUT COLUMNS
 * -------
 * We anticipate the following input columns being present (sent to the script task as inputs)
 *
 * Standard &quot;Error Output&quot; columns from tasks
 * ------
 * ErrorColumn      MSFT - The Lineage ID for the error column
 * ErrorCode        MSFT - The SSIS error code
 *
 * Additional error columns specific to our purposes.  You can reuse these or update the column names
 * ------
 * error_id         CUSTOM - Same as the SSIS error code, but we need them for our table
 * column_nm        CUSTOM - The name of the column where the error occurred
 * record_id_dsc    CUSTOM - the column name for the &quot;primary key&quot; column (EX: employee_id)
 * record_id        CUSTOM - the value/ID for the &quot;primary key&quot; column so you can look up the row later
 *                              EX:&quot;12345&quot; in column &quot;employee_id&quot;
 *
 * error_id         CUSTOM - the SSIS error (same as ErrorCode, but for my purposes we left it here)
 * error_dsc        CUSTOM - the human-readable description of the SSIS error EX: &quot;The data was truncated.&quot;
 *
 * REQUIRED VARIABLES
 * -------
 * NOTE: You MUST set these up as a read-only variables within your script task.
 *
 * Package Variable: @colmap (dictionary) - the collection of column names and IDs for our dynamic columns
 *                                        - this is set in the outer DFT+ OnColumnAdded()
 *                                        - we use this to pull out the full list of columns since we can't get ahold
 *                                        - of the prior step's column IDs/LineageIDs when we're in this script task
 *
 * Package Variable: @table_keycol (string) - the name of the column that represents your primary key
 *                                              EX: &quot;employee_id&quot;
 *
 * This is a cheap hack, but for my situation I'm OK with that.  We don't necessarily know what a &quot;key&quot;
 * column is at this point - primary key, I mean here.  So to get around that we set that value in a variable
 * within the overall package.  We then use that variable to say &quot;oh, that's the key column&quot; later and retrieve
 * the column name and the column value so we can write out our primary key reference info.  You'll see the
 * obvious limitation - we don't support compound primary keys.  But neither does my logging table, so...
 * 
&nbsp;
*/</span>
<span style="color: #009900;">&#91;</span>Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Pipeline</span>.<span style="color: #202020;">SSISScriptComponentEntryPointAttribute</span><span style="color: #009900;">&#93;</span>
public class ScriptMain <span style="color: #339933;">:</span> UserComponent
<span style="color: #009900;">&#123;</span>
&nbsp;
    private <span style="color: #993333;">int</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> m_idx<span style="color: #339933;">;</span>
&nbsp;
    private string key_col_name<span style="color: #339933;">;</span>        <span style="color: #666666; font-style: italic;">//&quot;name&quot; of primary key column.  EX: &quot;employee_id&quot;.</span>
    private <span style="color: #993333;">int</span> key_col_id<span style="color: #339933;">;</span>             <span style="color: #666666; font-style: italic;">//Relative column index / position of our &quot;primary key&quot; column</span>
    <span style="color: #666666; font-style: italic;">//single primary key column.  Does not handle compound primary keys.  Retrieve this from a package variable since we want to handle this</span>
    <span style="color: #666666; font-style: italic;">//dynamically and can't automatically determine it from within the package at runtime</span>
&nbsp;
    private Dictionary colnames<span style="color: #339933;">;</span>           <span style="color: #666666; font-style: italic;">//collection to store our colnames for later use within row processing section</span>
    private Dictionary colpositions<span style="color: #339933;">;</span>
    private Dictionary colidsbyposition<span style="color: #339933;">;</span>
    private Dictionary colids<span style="color: #339933;">;</span>           <span style="color: #666666; font-style: italic;">//collection to store our column ids for later use within row processing section</span>
    private Dictionary colmap<span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">//Internal column tracking numbers.</span>
    <span style="color: #666666; font-style: italic;">//You could probably avoid using these as separate variables, but...</span>
    <span style="color: #666666; font-style: italic;">// 1. I'm not that clever</span>
    <span style="color: #666666; font-style: italic;">// 2. I really, really wanted to explicitly watch them as they moved around</span>
    private <span style="color: #993333;">int</span> i_error_code_id<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_error_column_id<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_error_id<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_column_nm<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_record_id<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_record_id_dsc<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_error_dsc<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_error_data<span style="color: #339933;">;</span>
&nbsp;
    StringBuilder _sbColIDs <span style="color: #339933;">=</span> new StringBuilder<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    StringBuilder _sbErrorCols <span style="color: #339933;">=</span> new StringBuilder<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    private bool isSourceErrorOutput <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">false</span><span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">// = true;</span>
    private string _OLEDBSourceType <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;&quot;</span><span style="color: #339933;">;</span>
&nbsp;
    public override <span style="color: #993333;">void</span> PreExecute<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        base.<span style="color: #202020;">PreExecute</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        colnames <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        colpositions <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        colids <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        colidsbyposition <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        colmap <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        try
        <span style="color: #009900;">&#123;</span>
            <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>Variables.<span style="color: #202020;">tablecolmap</span>.<span style="color: #202020;">GetType</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> colmap.<span style="color: #202020;">GetType</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
                colmap <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>Dictionary<span style="color: #009900;">&#41;</span>Variables.<span style="color: #202020;">tablecolmap</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
        <span style="color: #009900;">&#125;</span>
        catch <span style="color: #009900;">&#40;</span>Exception exi<span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
        IDTSInput100 input <span style="color: #339933;">=</span> base.<span style="color: #202020;">ComponentMetaData</span>.<span style="color: #202020;">InputCollection</span><span style="color: #009900;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        IDTSVirtualInput100 virtInput <span style="color: #339933;">=</span> input.<span style="color: #202020;">GetVirtualInput</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #993333;">int</span> colsCount <span style="color: #339933;">=</span> virtInput.<span style="color: #202020;">VirtualInputColumnCollection</span>.<span style="color: #202020;">Count</span><span style="color: #339933;">;</span>
        m_idx <span style="color: #339933;">=</span> new <span style="color: #993333;">int</span><span style="color: #009900;">&#91;</span>colsCount<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> colIndex <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span> colIndex <span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;</span> colsCount<span style="color: #339933;">;</span> colIndex<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span>         <span style="color: #009900;">&#123;</span>             IDTSVirtualInputColumn100 column <span style="color: #339933;">=</span> virtInput.<span style="color: #202020;">VirtualInputColumnCollection</span><span style="color: #009900;">&#91;</span>colIndex<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>             <span style="color: #666666; font-style: italic;">//================================================================             //pull out the error codes and column IDs             if (string.Compare(column.Name, &quot;ErrorColumn&quot;, true) == 0)             {                 i_error_column_id = colIndex;             }             if (string.Compare(column.Name, &quot;ErrorCode&quot;, true) == 0)             {                 i_error_code_id = colIndex;             }             if (string.Compare(column.Name, &quot;error_id&quot;, true) == 0)             {                 i_error_id = colIndex;             }             if (string.Compare(column.Name, &quot;column_nm&quot;, true) == 0)             {                 i_column_nm = colIndex;             }             if (string.Compare(column.Name, &quot;record_id&quot;, true) == 0)             {                 i_record_id = colIndex;             }             if (string.Compare(column.Name, &quot;record_id_dsc&quot;, true) == 0)             {                 i_record_id_dsc = colIndex;             }             if (string.Compare(column.Name, &quot;error_dsc&quot;, true) == 0)             {                 i_error_dsc = colIndex;             }             if (string.Compare(column.Name, &quot;error_data&quot;, true) == 0)             {                 i_error_data = colIndex;             }             //add our column names to our list for later use             colnames.Add(column.LineageID, column.Name); //column.LineageID used to look up index of error column name in row             colids.Add(column.LineageID, colIndex); //column.LineageID used to look up index of error column index position in row             colidsbyposition.Add(colIndex, column.LineageID);             try             {                 colpositions.Add(column.Name, colIndex);             }             catch { }             try             {                 //is this column the &quot;key&quot; column we're using to identify the key values for the row? EX: primary key                 //NOTE: we're only doing this for a single member if a compound primary key                 if (string.Compare(column.Name, Variables.tablekeycol, true) == 0)//true = ignore case during comparison                 {                     key_col_id = colIndex;                     key_col_name = column.Name;                 }             }             catch { }             //================================================================             m_idx[colIndex] = base.HostComponent.BufferManager.FindColumnByLineageID(                 input.Buffer,                 column.LineageID);         }     }     public override void PostExecute()     {         base.PostExecute();     }     public override void Input_ProcessInputRow(InputBuffer Row)     {         int colsCount = m_idx.Length;         int cColLineageKey;         if (colsCount &amp;gt; 0)</span>
        <span style="color: #009900;">&#123;</span>
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">//stuff the errocode into the error_id column</span>
                Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_error_id<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_error_code_id<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
            catch <span style="color: #009900;">&#40;</span>Exception ex<span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">//get the value for the &quot;primary key&quot; column</span>
                Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_record_id<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>key_col_id<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
            catch <span style="color: #009900;">&#40;</span>Exception ex<span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">//get the value for the &quot;primary key&quot; column</span>
                Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_record_id_dsc<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> key_col_name<span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
            catch <span style="color: #009900;">&#40;</span>Exception ex<span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">//get the error description</span>
                Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_error_dsc<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>ComponentMetaData.<span style="color: #202020;">GetErrorDescription</span><span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span>.<span style="color: #202020;">Parse</span><span style="color: #009900;">&#40;</span>Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_error_code_id<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">ToString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
            catch <span style="color: #009900;">&#40;</span>Exception ex<span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">//get the name and value of the column that failed.</span>
                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>i_error_column_id <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_error_column_id <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_error_column_id  <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
                        <span style="color: #009900;">&#123;</span>
                            <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>colmap.<span style="color: #202020;">TryGetValue</span><span style="color: #009900;">&#40;</span>cColLineageKey<span style="color: #339933;">,</span> out columnName<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                            <span style="color: #009900;">&#123;</span>
                                <span style="color: #666666; font-style: italic;">//use the lineage_id to pull the column name</span>
                                <span style="color: #666666; font-style: italic;">//columnName should be set</span>
                                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>cColLineageKey <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> cColLineageKey <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> columnName <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> columnName.<span style="color: #202020;">Length</span> <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
                                <span style="color: #009900;">&#123;</span>
                                    <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>colpositions.<span style="color: #202020;">TryGetValue</span><span style="color: #009900;">&#40;</span>columnName<span style="color: #339933;">,</span> out currentposition<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                                    <span style="color: #009900;">&#123;</span>
                                        <span style="color: #666666; font-style: italic;">//use the lineage_id to pull the column name</span>
                                        <span style="color: #666666; font-style: italic;">//current position should be set</span>
                                    <span style="color: #009900;">&#125;</span>
                                <span style="color: #009900;">&#125;</span>
                                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>cColLineageKey <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> cColLineageKey <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> currentposition <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;=</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span><span style="color: #666666; font-style: italic;">//&amp;amp;&amp;amp; currentposition != null)</span>
                                <span style="color: #009900;">&#123;</span>
                                    <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>colidsbyposition.<span style="color: #202020;">TryGetValue</span><span style="color: #009900;">&#40;</span>currentposition<span style="color: #339933;">,</span> out cColLineageKey<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                                    <span style="color: #009900;">&#123;</span>
                                        <span style="color: #666666; font-style: italic;">//use the lineage_id to pull the column name</span>
                                        <span style="color: #666666; font-style: italic;">//current position should be set</span>
                                    <span style="color: #009900;">&#125;</span>
                                <span style="color: #009900;">&#125;</span>
                            <span style="color: #009900;">&#125;</span>
                            <span style="color: #b1b100;">else</span>
                            <span style="color: #009900;">&#123;</span>
                                cColLineageKey <span style="color: #339933;">=</span> cColLineageKey <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
                            <span style="color: #009900;">&#125;</span>
                        <span style="color: #009900;">&#125;</span>
                        <span style="color: #b1b100;">else</span>
                        <span style="color: #009900;">&#123;</span>
                            <span style="color: #666666; font-style: italic;">//probably a &quot;source error output&quot;</span>
                            cColLineageKey <span style="color: #339933;">=</span> cColLineageKey <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
                            <span style="color: #666666; font-style: italic;">//MAJOR MAJOR MAJOR HACK</span>
                            <span style="color: #666666; font-style: italic;">//apparently, we do NOT persist the ORIGINAL LINEAGEID from source to output, so we need to... adjust... the number.</span>
                            <span style="color: #666666; font-style: italic;">// this is EXCEPTIONALLY RISKY, but since MS &quot;adjusts&quot; the output rows for errors to have be &quot;different&quot; from the &quot;it works!&quot; destination</span>
                            <span style="color: #666666; font-style: italic;">// we don't have much of a choice.  In reviewing them #'s it appears they consistently increment for errors, so we need to increment the</span>
                            <span style="color: #666666; font-style: italic;">// index here to find the right value.  Horrible stuff.  Likely to break.  Enjoy.</span>
                        <span style="color: #009900;">&#125;</span>
                    <span style="color: #009900;">&#125;</span>
&nbsp;
                    <span style="color: #666666; font-style: italic;">//Retrieve from the column names dictionary and place column name in error info</span>
                    <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>i_column_nm <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_column_nm <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_column_nm  <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
                        <span style="color: #009900;">&#123;</span>
                            <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>colnames.<span style="color: #202020;">TryGetValue</span><span style="color: #009900;">&#40;</span>cColLineageKey<span style="color: #339933;">,</span> out value<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                            <span style="color: #009900;">&#123;</span>
                                <span style="color: #666666; font-style: italic;">//use the lineage_id to pull the column name</span>
                                Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_column_nm<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> value<span style="color: #339933;">;</span>
                            <span style="color: #009900;">&#125;</span>
                        <span style="color: #009900;">&#125;</span>
                    <span style="color: #009900;">&#125;</span>
                    <span style="color: #666666; font-style: italic;">//get the missing column value for the key found at the identified &quot;error column&quot;</span>
                    <span style="color: #666666; font-style: italic;">//had issues where the column blew up because of data type conversion issues, so try/catch is here to help handle this</span>
                    try
                    <span style="color: #009900;">&#123;</span>
                        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>i_error_data <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_error_data <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_error_data  <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
                            <span style="color: #009900;">&#123;</span>
                                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>colids.<span style="color: #202020;">TryGetValue</span><span style="color: #009900;">&#40;</span>cColLineageKey<span style="color: #339933;">,</span> out colvalue<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                                <span style="color: #009900;">&#123;</span>
                                    <span style="color: #666666; font-style: italic;">//use the lineage_id to pull the column name</span>
                                    <span style="color: #666666; font-style: italic;">//NOTE: &quot;bad&quot; data MAY be totally thrown out here, which is why we're using the try/catch</span>
                                    <span style="color: #666666; font-style: italic;">//if the custom CozyRoc row processor dies due to formatting errors then this will throw an exception</span>
                                    <span style="color: #666666; font-style: italic;">//we're just going to ignore that and roll on by</span>
                                    <span style="color: #666666; font-style: italic;">//probably worth revisiting at a later date to see if we can get at the bad data anyway</span>
                                    Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_error_data<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>colvalue<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">ToString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
                                <span style="color: #009900;">&#125;</span>
                            <span style="color: #009900;">&#125;</span>
                        <span style="color: #009900;">&#125;</span>
                    <span style="color: #009900;">&#125;</span>
                    catch <span style="color: #009900;">&#40;</span>Exception vEx<span style="color: #009900;">&#41;</span>
                    <span style="color: #009900;">&#123;</span>
                    <span style="color: #009900;">&#125;</span>
&nbsp;
                <span style="color: #009900;">&#125;</span>
                <span style="color: #b1b100;">else</span>
                <span style="color: #009900;">&#123;</span>
                <span style="color: #009900;">&#125;</span>
            <span style="color: #009900;">&#125;</span>
            catch <span style="color: #009900;">&#40;</span>Exception ex<span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #009900;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/edw/2012/01/etl-assistant-getting-error-row-description-and-column-dynamically/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Developing for NUBIC</title>
		<link>http://informatics.northwestern.edu/blog/nubic-dev-2/2011/12/developing-for-nubic/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=developing-for-nubic</link>
		<comments>http://informatics.northwestern.edu/blog/nubic-dev-2/2011/12/developing-for-nubic/#comments</comments>
		<pubDate>Tue, 20 Dec 2011 19:46:32 +0000</pubDate>
		<dc:creator>Jeff Lunt</dc:creator>
				<category><![CDATA[NUBIC Development]]></category>
		<category><![CDATA[dev environment]]></category>
		<category><![CDATA[nubic-dev]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=574</guid>
		<description><![CDATA[NUBIC is the Northwestern University Biomedical Informatics Center in Chicago. Our developers write software, computation, and data analysis tools that support medical research. There is a famous blog post written by Joel Spolsky titled &#8220;The Joel Test.&#8221; It describes some &#8230; <a href="http://informatics.northwestern.edu/blog/nubic-dev-2/2011/12/developing-for-nubic/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.nucats.northwestern.edu/clinical-research-resources/data-collection-biomedical-informatics-and-nubic/bioinformatics-overview.html">NUBIC</a> is the Northwestern University Biomedical Informatics Center in Chicago. Our developers write software, computation, and data analysis tools that support medical research.</p>
<hr />
<p>There is a famous blog post written by <a href="http://www.joelonsoftware.com/AboutMe.html">Joel Spolsky</a> titled <a href="http://www.joelonsoftware.com/articles/fog0000000043.html">&#8220;The Joel Test.&#8221;</a> It describes some key parts of what make a given development shop a great (or scary) place to work. It&#8217;s also a key metric used on <a href="http://careers.stackoverflow.com/">careers.stackoverflow.com</a>, where every employer posting a job is encouraged to rate themselves, and post their own score on &#8220;The Joel Test.&#8221;</p>
<p>I use &#8220;The Joel Test&#8221; as one of the ways that I critique a potential employer, what sort of value they place on developers, and software development as a practice. I&#8217;m especially passionate about software, more than any other professional or personal pursuit in my life, and I want to work for places that value the work I do for them just as highly as I value the work myself. As such, I think &#8220;The Joel Test&#8221; is a pretty good indicator of that aspect of a workplace.</p>
<p>Since I tend to agree with values put forth in &#8220;The Joel Test,&#8221; I look for a place that scores highly. After working inside NUBIC for just a few months, I&#8217;m pretty happy with what I see here. However, rather than providing a score for NUBIC (I obviously think we score very high), I&#8217;ll just layout what we do here:</p>
<ol>
<li><strong>Do you use source control?</strong> We have an internal Git server, and we also <a href="https://github.com/nubic">publish much of our code on GitHub</a>.</li>
<li><strong>Can you make a build in one step?</strong> We have a <a href="http://jenkins-ci.org/">Jenkins CI server</a> that is tied to our Git project repositories. It automatically runs our test suite when new code is committed to the master branch. It also sends out automated emails when a build fails.</li>
<li><strong>Do you make daily builds?</strong> As often as code is committed, it is built and tested. For projects under active development it&#8217;s common to see multiple builds per day.</li>
<li><strong>Do you have a bug database?</strong> We use a combination of <a href="http://www.redmine.org/">Redmine</a> for internal projects, and <a href="https://github.com/">GitHub</a> for open source projects.</li>
<li><strong>Do you fix bugs before writing new code?</strong> We prioritize bugs over new features. For the purpose of this question I&#8217;m defining bugs as things that are broken in the system, as opposed to simply inconvenient, or inefficient. However, we definitely label inconvenient or inefficient code as a bug if it is preventing meaningful work from getting done (rightfully so), and will prioritize it as such.</li>
<li><strong>Do you have an up-to-date schedule?</strong> We scope and schedule work for all of our projects, and keep clients in the loop at every step of the process, whether we&#8217;re ahead or behind schedule.</li>
<li><strong>Do you have a spec?</strong> The vast majority of projects don&#8217;t have specs for every feature; some of them are as simple as, &#8220;Can you change this thing from red to blue, and move it to the left by 5 pixels?&#8221; which I suppose qualifies as a spec, but may be communicated verbally as opposed to going through a formal process. However, anytime there&#8217;s major work to be done, or any place that we feel has lots of ambiguity, we work it out verbally or on a whiteboard, and then commit the details of that session back to the bug tracking system. After a project launch we also produce documentation for internal purposes, as well as training documents and screencasts for the benefit of end users.</li>
<li><strong>Do programmers have quiet working conditions?</strong> This is one of my favorite things about NUBIC. Though we do work largely in a set of cubicles, we accomplish a quiet working environment in a several ways. First, and foremost, we respect each other by taking conversations that we expect to last more than a minute or two away from the common workspace, and into side offices with closed doors that keep the sound contained. Second, most of us also have noise-cancelling headphones that we use to either listen to our favorite music while coding, or simply to soften the ambient office noise (which is nearly silent most of the time anyway). Third, we prefer asynchronous communication (email + IM) over coming to a person&#8217;s desk unannounced. Unless you&#8217;re on the support rotation for the week, email can safely be checked just 2-3 times a day, and on a typical day I will have fewer than two IM conversations. We also like to eat lunch together often, which sometimes serve as conversation starters for projects, in place of formal meetings.</li>
<li><strong>Do you use the best tools money can buy?</strong> We&#8217;re not talking about the most expensive tools &#8211; we&#8217;re talking about the best tools to do the job. Besides personal preferences, the tools we&#8217;re given are a help, not a hindrance to our work. For example, I personally prefer <a href="http://www.fogcreek.com/fogbugz/">FogBugz</a> for issue tracking over <a href="http://github.com">GitHub</a>, due to its integrated, and more advanced project management features, but GitHub is by no means a hindrance to the workflow. There are also regular, open discussions that occur regarding tool choice. Programmers are not only free to experiment with various tools, but encouraged to do so, bringing their experiences back to the group so everyone can benefit. We maintain a coding library that holds maybe 100 books that have been collected over the years, and cover topics ranging from various programming languages, to guides on design, typeface choice, and the art of software craftsmanship. As for the day-to-day tools, we&#8217;re lucky to have a solid, flexible hardware setup. Here&#8217;s a photo of the standard issue desk: 15&#8243; MacBook Pro + Apple Cinema Display, and push-pin friendly walls for hanging up notes or creating a makeshift <a href="http://www.infoq.com/resource/articles/hiranabe-lean-agile-kanban/en/resources/image5.jpg">kanban</a>board.
<p><div id="attachment_608" class="wp-caption alignnone" style="width: 459px"><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/12/1219111201.jpg"><img class="size-full wp-image-608" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/12/1219111201.jpg" alt="" width="449" height="337" /></a><p class="wp-caption-text">MacBook Pro, Apple cinema display, whiteboard for prototyping </p></div></li>
<li><strong>Do you have testers?</strong> In addition to test- and behavior-driven development practices that each developer employs, we have people who specialize in user experience and UI design. We also encourage Usability Testing, of the sort outlined in Steve Krug&#8217;s <a href="http://www.amazon.com/Rocket-Surgery-Made-Easy-Yourself/dp/0321657292">&#8220;Rocket Surgery Made Easy&#8221;</a>. Finding a coworker interested in pair programming is also very easy. In NUBIC, responsibility for testing software is a partnership between developers, clients, and groups that verify projects on staging systems, as well as anyone else who is simply interested in assisting a given project. <strong>We do not, however, have dedicated testers</strong> whose job it is to do nothing but think of ways to break the system (in order to prevent it). There&#8217;s been quite a bit of discussion about this, and we&#8217;re hoping that as our group grows, we&#8217;ll be able to add dedicated testers. The reasons for wanting dedicated testers are many, but I think the greatest benefit to having them is that having someone whose job it is to think of ways to break things is fundamentally a different role than a developer (whose job it is to think of ways to make things work). Both sides are important.</li>
<li><strong>Do new candidates write code during their interview?</strong> Developers are expected to be able to demonstrate their ability to implement a few common algorithms, as well as design a small system, live, during the interview. If you&#8217;re awesome at software, it&#8217;s not a big deal. It basically proves that you have three things: a solid background in algorithms, an ability to reason about a system and accept feedback from others on its design, and an ability to ask intelligent questions of the client about what they want.</li>
<li><strong>Do you do hallway usability testing?</strong> Some people do this more than others, but anytime you&#8217;re implementing a new feature where you&#8217;d like some feedback, it&#8217;s encouraged that you consult with people around you. The culture in NUBIC is such that I&#8217;ve never run into a person who wasn&#8217;t willing to give you five minutes of their time to examine something you&#8217;re working on. This often turns into a discussion of several different approaches to implementation, and helps lead to software that benefits from multiple perspectives, without adding a bunch of wasted time to the project schedule in formal meetings.</li>
</ol>
<p>On top of that, one of my favorite things about working at Northwestern is the focus on collaboration, and leveraging the large community of brilliant people around you. Northwestern is one of the leading research universities in the country, and they have no shortage smart people that are passionate about what they do.</p>
<p>The panoramic views of Lake Michigan aren&#8217;t bad either.</p>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/12/10281108181.jpg"><img class="alignnone size-full wp-image-616" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/12/10281108181.jpg" alt="" width="2048" height="1536" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/nubic-dev-2/2011/12/developing-for-nubic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is Your Mobile Phone a HIPAA Violator?</title>
		<link>http://informatics.northwestern.edu/blog/cid/2011/12/is-your-mobile-phone-a-hipaa-violator/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=is-your-mobile-phone-a-hipaa-violator</link>
		<comments>http://informatics.northwestern.edu/blog/cid/2011/12/is-your-mobile-phone-a-hipaa-violator/#comments</comments>
		<pubDate>Fri, 02 Dec 2011 17:11:03 +0000</pubDate>
		<dc:creator>Justin Starren</dc:creator>
				<category><![CDATA[CID Chief Info-dude]]></category>
		<category><![CDATA[HIT Policy]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=560</guid>
		<description><![CDATA[The recent firestorm over the discovery of a rootkit on many Mobile phones has raised the specter of federal wiretap violations, as discussed in a recent Forbes article.  The rootkit manufacturer, Carrier IQ, denies that it collected keystrokes.  However, a &#8230; <a href="http://informatics.northwestern.edu/blog/cid/2011/12/is-your-mobile-phone-a-hipaa-violator/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The recent firestorm over the discovery of a rootkit on many Mobile phones has raised the specter of federal wiretap violations, as discussed in a recent <a title="Phone 'Rootkit' Maker Carrier IQ May Have Violated Wiretap Law In Millions Of Cases" href="http:///www.forbes.com/sites/andygreenberg/2011/11/30/phone-rootkit-carrier-iq-may-have-violated-wiretap-law-in-millions-of-cases/">Forbes article.</a>  The rootkit manufacturer, Carrier IQ, <a href="http://www.pcmag.com/article2/0,2817,2397156,00.asp">denies that it collected keystrokes</a>.  However, a recent video post appears to show the software <a href="http://www.geek.com/articles/mobile/security-researcher-responds-to-carrieriq-with-video-proof-20111129/">doing exactly that</a>.</p>
<p>What has not hit the press yet is the issue of HIPAA potential violations.  It is important to remember that inter HITECH, covered entities are responsible for breaches even if they  <em>didn&#8217;t know, and by reasonable diligence would not have known.</em>  In other words, if your phone sent PHI to the phone company, you are potentially <a title="HIPAA Act Enforcement Interim Final Rule" href="http://www.hhs.gov/ocr/privacy/hipaa/administrative/enforcementrule/enfifr.pdf">liable for $100-50,000 per violation</a> (probably will be interpreted as each compromised message) and up to $1.5 million total.</p>
<p>Sprint, AT&amp;T and T-Mobile admit to use Carrier IQ.  Verizon says it does not.  I&#8217;m sure that my hospital is not ready to ban all non-Verizon phones&#8230;yet.</p>
<p><del>This is not unique to mobile phones.  The new <a href="http://www.pcmag.com/article2/0,2817,2397014,00.asp">Kindle Fire </a>provides web browsing, so you might use it to access your web-based EHR, right?  However, the actual rendering of the web page is done in the Amazon Cloud and a compressed version of the page is sent to your device.  From a security standpoint, this means that Amazon must  execute what amounts to a <a href="http://en.wikipedia.org/wiki/Man-in-the-middle_attack">man-in-the-middle attack</a> on your secure browsing session.  Whether Amazon looks at your data, or not, is irrelevant.  They can do it at any time, and you would be none the wiser.</del></p>
<p>Thanks to Matt who pointed out that EFF had a nice evaluation of <a href="https://www.eff.org/2011/october/amazon-fire’s-new-browser-puts-spotlight-privacy-trade-offs">security on the Kindle Fire.</a>  It turns out that HTTPS sessions bypass the cloud rendering engine, so things are not as dire as I had thought.  Google also assures that they are not logging content.  Perhaps, but since they are logging the URL and session token, they can probably reproduce much of the content at a later date.</p>
<p>Ultimately, this is why domain specific privacy (e.g. health privacy, mail privacy or video rental privacy)  laws are doomed to fail.  Our digital lives are far too interconnected for any service provider to create separate services to comply with the separate regulations.  As a health care organization, we cannot generally share location data more precise than a 3-digit zip code.   The phone company can collect, and give to advertisers, my location down to a few feet!  I&#8217;m sure that the phone company can tell, if they wanted, what chronic medications I take simply based on which lab I go to each month.  We need a fundamental right of digital privacy.  That right need to be based on the concept of information use, rather than on information access.  Anyone who uses my digital data should admit that they use it and justify why.</p>
<p>A good analogy for this is credit ratings.  Anyone who uses a credit rating to grant or deny a service is required to say what information they used to make the decision.  If it appears that someone used information that they should not have, like age or race, they need to be able to explain how they reached the decision without using that data.  The same should be true of all of our digital &#8220;breadcrumbs&#8221; that was scatter across the digital landscape as we attempt to live our digital lives.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/cid/2011/12/is-your-mobile-phone-a-hipaa-violator/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Medical Science and the Blogosphere</title>
		<link>http://informatics.northwestern.edu/blog/cid/2011/11/medical-science-and-the-blogosphere/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=medical-science-and-the-blogosphere</link>
		<comments>http://informatics.northwestern.edu/blog/cid/2011/11/medical-science-and-the-blogosphere/#comments</comments>
		<pubDate>Wed, 30 Nov 2011 00:42:25 +0000</pubDate>
		<dc:creator>Justin Starren</dc:creator>
				<category><![CDATA[CID Chief Info-dude]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=536</guid>
		<description><![CDATA[Electronic publication of results is rapidly supplanting conventional scientific journals.  The challenge is to separate fact from fiction.  While peer review is far from perfect, it is still pretty good.  In the blogosphere, volume can overwhelm substance, and truthiness can &#8230; <a href="http://informatics.northwestern.edu/blog/cid/2011/11/medical-science-and-the-blogosphere/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Electronic publication of results is rapidly supplanting conventional scientific journals.  The challenge is to separate fact from fiction.  While peer review is far from perfect, it is still pretty good.  In the blogosphere, volume can overwhelm substance, and <a title="Wikipedia definition" href="http://en.wikipedia.org/wiki/Truthiness">truthiness</a> can overwhelm truth.  When the blogosphere gets combined with legal threats,  rational discourse can be silenced.  Rhys Morgan, a high school student, got a lesson in legal intimidation when he questioned an unpublished medical therapy.  His <a href="http://rhysmorgan.co/2011/11/threats-from-the-burzynski-clinic/">description of the ordeal</a> is both enlightening and cautionary.  I applaud both his articulateness and his tenacity.</p>
<p>As we increasingly use electronic media to accelerate the spread of scientific and medical knowledge, we need to think seriously about how we insure that the information we spread is the most correct, not just the loudest, truthiest, sexiest, or best financed.</p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/cid/2011/11/medical-science-and-the-blogosphere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Best iPad Stylus</title>
		<link>http://informatics.northwestern.edu/blog/cid/2011/11/best-ipad-stylus/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=best-ipad-stylus</link>
		<comments>http://informatics.northwestern.edu/blog/cid/2011/11/best-ipad-stylus/#comments</comments>
		<pubDate>Tue, 22 Nov 2011 19:41:26 +0000</pubDate>
		<dc:creator>Justin Starren</dc:creator>
				<category><![CDATA[CID Chief Info-dude]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=524</guid>
		<description><![CDATA[As we try to use iPads to replace the medical clipboard, the need for an iPad writing instrument becomes paramount.  As every fountain pen user will attest, writing instruments are very personal items.  There have been many queries on the &#8230; <a href="http://informatics.northwestern.edu/blog/cid/2011/11/best-ipad-stylus/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As we try to use iPads to replace the medical clipboard, the need for an iPad writing instrument becomes paramount.  As every fountain pen user will attest, writing instruments are very personal items.  There have been many queries on the web trying to find the best iPad stylus.  There have even been <a title="stylus reviews" href="http://www.imedicalapps.com/2011/02/ipad-stylus-review-best-handwriting-touch-screen/">reviews</a>.  To me, there are two main criteria for a good stylus.  First, low friction.  Finding a stylus with the right friction on the iPad screen is not easy, especially since it changes with the amount of finger grease on the glass.  Second, is accuracy.  I want to be able to write with the same resolution as a pen on paper.</p>
<p>While I have not tried every stylus on the marked, I have tried a number.  The major problem with the rubber tipped styli is that the friction seems to change as the rubber gets worn.  iFaraday stylus uses a cloth tip that glides smoothly and does not seem to change as much as the rubber.  The <a title="iFaraday store" href="http://www.ifaraday.com/store.html">iFaraday Artist, Firm Dome</a> is the best I have found by far.</p>
<p>Of course, nothing is perfect.  I would love to see the firm dome available in the Rx stylus, which has a cap to cover the tip when not in use.  I would also love to see the Rx cap be clip on, rather than screw on.  Even so, this has become my go-to stylus for note taking, sketching, or making marginal notes in papers I read on the iPad.</p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/cid/2011/11/best-ipad-stylus/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ETL Assistant &#8211; Using CozyRoc&#8217;s Parallel Loop Task</title>
		<link>http://informatics.northwestern.edu/blog/edw/2011/11/etl-assistant-using-cozyrocs-parallel-loop-task/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=etl-assistant-using-cozyrocs-parallel-loop-task</link>
		<comments>http://informatics.northwestern.edu/blog/edw/2011/11/etl-assistant-using-cozyrocs-parallel-loop-task/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 18:50:11 +0000</pubDate>
		<dc:creator>Eric Whitley</dc:creator>
				<category><![CDATA[EDW]]></category>
		<category><![CDATA[ETL Assistant]]></category>
		<category><![CDATA[SSIS]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=454</guid>
		<description><![CDATA[In our continuing series on building your own templates, reusable "point-and-click" ETL solution using CozyRoc's components for Microsoft SSIS, we'll be stepping into the "Parallel Loop Task".  Microsoft's default loop task is serial. If you ever have need to parallelize execution of SSIS tasks, overcoming this can be quite a challenge.  With the Parallel Loop Task you can, with a few quick clicks, enable parallel execution of tasks and packages to maximize the utilization of  your available resources. <a href="http://informatics.northwestern.edu/blog/edw/2011/11/etl-assistant-using-cozyrocs-parallel-loop-task/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>For the TLDR crowd &#8211; I&#8217;m supplying downloads of the packages so you can just open them and play.</p>
<p>Download &#8220;<a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/CozyRocParallelLoopDemo.zip">Cozy Roc Parallel Loop Demo Files</a>&#8221; contains:</p>
<ul>
<li>setup_sql.txt</li>
<li>SequentialLoop.dtsx</li>
<li>ParallelLoop.dtsx</li>
</ul>
<p>To use the demo files you&#8217;ll need to have at least the evaluation copy of the CozyRoc components installed (you can get 32bit and x64 versions from the CozyRoc site <a href="http://www.cozyroc.com/products">http://www.cozyroc.com/products</a>)</p>
<h2>Loops in SSIS</h2>
<p>SSIS provides a very handy loop task &#8211; you supply a collection (of type object) and iterate through that object, executing steps or processes for each item in the collection.</p>
<p>Microsoft&#8217;s description (<a href="http://msdn.microsoft.com/en-us/library/ms139956.aspx">http://msdn.microsoft.com/en-us/library/ms139956.aspx</a>) of the task:</p>
<blockquote><p>&#8220;The For Loop container defines a repeating control flow in a package. The loop implementation is similar to the <strong>For</strong> looping structure in programming languages. In each repeat of the loop, the For Loop container evaluates an expression and repeats its workflow until the expression evaluates to False.&#8221;</p></blockquote>
<p>Great &#8211; we can now loop through a set of data.</p>
<ul>
<li>For a given <strong>group</strong> of something (a collection)</li>
<li><strong>Iterate</strong> through the collection (a variable / instance)</li>
<li>For each <strong>instance</strong>, execute a process</li>
</ul>
<p>In the case of ETL Assistant we use this to do the following:</p>
<ul>
<li>We have a concept of a scheduling &#8220;<strong>group</strong>&#8221; &#8211; a set of source::destination table mappings. Let&#8217;s say I want to manage a set HR-related items (department list, employee list, address information, etc.) as a group (easier than managing individual tables).  I can put them into a &#8220;group&#8221; (collection).  Let&#8217;s call this &#8220;<em>HR tables</em>.&#8221;  I can do the same with a set of patient information (patient / person list, encounter /  visit information, and possibly some other patient demographic information).  Let&#8217;s call this &#8220;<em>Patient tables</em>.&#8221;</li>
<li>I can, for each group, pull back a list of tables (instances)</li>
<li>For each table I can execute a dynamic ETL on them to pump data from a source to a destination (EX: Oracle::employee -&gt; SQL Server::employee)</li>
</ul>
<p>The loop task does a great job managing simple collections and executing an operation per item.</p>
<p>The problem?  I&#8217;m now executing all of this serially.</p>
<div id="attachment_457" class="wp-caption alignnone" style="width: 676px"><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/standard_loop.png"><img class="size-full wp-image-457" title="SSIS Standard ForEach Loop" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/standard_loop.png" alt="SSIS Standard ForEach Loop" width="666" height="389" /></a></dt>
</dl>
</div>
<p>This means if I have a fairly beefy server I&#8217;m still <em>potentially</em> sitting idle while I do a simple set of ETL operations.  You have several ways to address this, but one I&#8217;ve found attractive is to convert from a serial ForEach loop to a <em>parallel</em> ForEach loop using the Parallel Loop Task from CozyRoc.  This will let us do n-parallel executions of a given operation.  If you have a 64 core host, for example, and the diagram above represented tables you wanted to load from a remote source, you could execute A, B, and C loads in parallel.</p>
<div class="mceTemp">
<dl id="attachment_459" class="wp-caption alignnone" style="width: 369px;">
<dt class="wp-caption-dt"><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/parallel_loop_idea.png"><img class="size-full wp-image-459" title="CozyRoc Parallel Loop Idea" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/parallel_loop_idea.png" alt="CozyRoc Parallel Loop Idea" width="359" height="386" /></a></dt>
</dl>
</div>
<p>Let&#8217;s get back to that example using the HR tables (department list, employee list, address information).  I can create a &#8220;group&#8221; (&#8220;HR&#8221;), then places these three tables into the HR group.  When I run a process to pull over the HR group I reference the group, pull back the three table references and place them into a collection.  I iterate through the collection and for n items in the collection, execute a task.</p>
<div class="mceTemp">
<dl id="attachment_460" class="wp-caption alignnone" style="width: 650px;">
<dt class="wp-caption-dt"><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/cozyroc_parallel_loop.png"><img class="size-full wp-image-460" title="CozyRoc Parallel Loop Task" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/cozyroc_parallel_loop.png" alt="CozyRoc Parallel Loop Task" width="640" height="367" /></a><p class="wp-caption-text">CozyRoc Parallel Loop Task</p></div>
<p>&nbsp;</p>
<h2>The Sequential ForEach Loop in SSIS</h2>
<p>Let&#8217;s do some quick setup steps to prep this test scenario</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">--create a test schema</span>
<span style="color: #993333; font-weight: bold;">CREATE</span> SCHEMA cozyroc AUTHORIZATION dbo
<span style="color: #993333; font-weight: bold;">GO</span> 
&nbsp;
<span style="color: #808080; font-style: italic;">--this is our &quot;group&quot; table</span>
<span style="color: #808080; font-style: italic;">--EX: HR</span>
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> cozyroc<span style="color: #66cc66;">.</span>etl_groups <span style="color: #66cc66;">&#40;</span>
	group_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">IDENTITY</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
	group_nm <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
	group_dsc <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">255</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">CONSTRAINT</span> <span style="color: #66cc66;">&#91;</span>PK_cozyroc_etl_groups_group_id<span style="color: #66cc66;">&#93;</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> CLUSTERED
<span style="color: #66cc66;">&#40;</span>
	<span style="color: #66cc66;">&#91;</span>group_id<span style="color: #66cc66;">&#93;</span> <span style="color: #993333; font-weight: bold;">ASC</span>
<span style="color: #66cc66;">&#41;</span><span style="color: #993333; font-weight: bold;">WITH</span> <span style="color: #66cc66;">&#40;</span>PAD_INDEX  <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span> STATISTICS_NORECOMPUTE  <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span> IGNORE_DUP_KEY <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span>
ALLOW_ROW_LOCKS  <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">ON</span><span style="color: #66cc66;">,</span> ALLOW_PAGE_LOCKS  <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">ON</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ON</span> <span style="color: #66cc66;">&#91;</span><span style="color: #993333; font-weight: bold;">PRIMARY</span><span style="color: #66cc66;">&#93;</span>
<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ON</span> <span style="color: #66cc66;">&#91;</span><span style="color: #993333; font-weight: bold;">PRIMARY</span><span style="color: #66cc66;">&#93;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">--this is our &quot;table&quot; table</span>
<span style="color: #808080; font-style: italic;">--EX: employees, addresses, etc.</span>
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> cozyroc<span style="color: #66cc66;">.</span>etl_tables <span style="color: #66cc66;">&#40;</span>
	table_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">IDENTITY</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
	table_nm <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
	table_dsc <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">255</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">CONSTRAINT</span> <span style="color: #66cc66;">&#91;</span>PK_cozyroc_etl_tables_table_id<span style="color: #66cc66;">&#93;</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> CLUSTERED
<span style="color: #66cc66;">&#40;</span>
	<span style="color: #66cc66;">&#91;</span>table_id<span style="color: #66cc66;">&#93;</span> <span style="color: #993333; font-weight: bold;">ASC</span>
<span style="color: #66cc66;">&#41;</span><span style="color: #993333; font-weight: bold;">WITH</span> <span style="color: #66cc66;">&#40;</span>PAD_INDEX  <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span> STATISTICS_NORECOMPUTE  <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span> IGNORE_DUP_KEY <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span>
ALLOW_ROW_LOCKS  <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">ON</span><span style="color: #66cc66;">,</span> ALLOW_PAGE_LOCKS  <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">ON</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ON</span> <span style="color: #66cc66;">&#91;</span><span style="color: #993333; font-weight: bold;">PRIMARY</span><span style="color: #66cc66;">&#93;</span>
<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ON</span> <span style="color: #66cc66;">&#91;</span><span style="color: #993333; font-weight: bold;">PRIMARY</span><span style="color: #66cc66;">&#93;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">--this is our associative table to store</span>
<span style="color: #808080; font-style: italic;">--  the mapping from group::table</span>
<span style="color: #808080; font-style: italic;">--  I'm using this in the example because in later</span>
<span style="color: #808080; font-style: italic;">--  posts we'll allow the table to be &quot;grouped&quot;</span>
<span style="color: #808080; font-style: italic;">--  multiple times</span>
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> cozyroc<span style="color: #66cc66;">.</span>etl_group_tables <span style="color: #66cc66;">&#40;</span>
	group_table_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">IDENTITY</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span>
	group_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
	table_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
<span style="color: #993333; font-weight: bold;">CONSTRAINT</span> <span style="color: #66cc66;">&#91;</span>PK_cozyroc_etl_group_tables_group_table_id<span style="color: #66cc66;">&#93;</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> CLUSTERED
<span style="color: #66cc66;">&#40;</span>
	<span style="color: #66cc66;">&#91;</span>group_table_id<span style="color: #66cc66;">&#93;</span> <span style="color: #993333; font-weight: bold;">ASC</span>
<span style="color: #66cc66;">&#41;</span><span style="color: #993333; font-weight: bold;">WITH</span> <span style="color: #66cc66;">&#40;</span>PAD_INDEX  <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span> STATISTICS_NORECOMPUTE  <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span> IGNORE_DUP_KEY <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span>
ALLOW_ROW_LOCKS  <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">ON</span><span style="color: #66cc66;">,</span> ALLOW_PAGE_LOCKS  <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">ON</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ON</span> <span style="color: #66cc66;">&#91;</span><span style="color: #993333; font-weight: bold;">PRIMARY</span><span style="color: #66cc66;">&#93;</span>
<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ON</span> <span style="color: #66cc66;">&#91;</span><span style="color: #993333; font-weight: bold;">PRIMARY</span><span style="color: #66cc66;">&#93;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">--insert a sample group</span>
<span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> cozyroc<span style="color: #66cc66;">.</span>etl_groups <span style="color: #66cc66;">&#40;</span>group_nm<span style="color: #66cc66;">,</span> group_dsc<span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">VALUES</span> <span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'HR'</span><span style="color: #66cc66;">,</span> <span style="color: #ff0000;">'HR Group'</span><span style="color: #66cc66;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">--insert some sample tables</span>
<span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> cozyroc<span style="color: #66cc66;">.</span>etl_tables <span style="color: #66cc66;">&#40;</span>table_nm<span style="color: #66cc66;">,</span> table_dsc<span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">VALUES</span> <span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'Employees'</span><span style="color: #66cc66;">,</span> <span style="color: #ff0000;">'Employees Table'</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> cozyroc<span style="color: #66cc66;">.</span>etl_tables <span style="color: #66cc66;">&#40;</span>table_nm<span style="color: #66cc66;">,</span> table_dsc<span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">VALUES</span> <span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'Departments'</span><span style="color: #66cc66;">,</span> <span style="color: #ff0000;">'Departments Table'</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> cozyroc<span style="color: #66cc66;">.</span>etl_tables <span style="color: #66cc66;">&#40;</span>table_nm<span style="color: #66cc66;">,</span> table_dsc<span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">VALUES</span> <span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'Addresses'</span><span style="color: #66cc66;">,</span> <span style="color: #ff0000;">'Addresses Table'</span><span style="color: #66cc66;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">--blindly cross join everything</span>
<span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> cozyroc<span style="color: #66cc66;">.</span>etl_group_tables <span style="color: #66cc66;">&#40;</span>group_id<span style="color: #66cc66;">,</span> table_id<span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">SELECT</span>
	g<span style="color: #66cc66;">.</span>group_id<span style="color: #66cc66;">,</span> t<span style="color: #66cc66;">.</span>table_id
<span style="color: #993333; font-weight: bold;">FROM</span>
	cozyroc<span style="color: #66cc66;">.</span>etl_groups g<span style="color: #66cc66;">,</span>
	cozyroc<span style="color: #66cc66;">.</span>etl_tables t
&nbsp;
<span style="color: #808080; font-style: italic;">--now let's also create a logging table</span>
<span style="color: #808080; font-style: italic;">-- this is a placeholder for more complex</span>
<span style="color: #808080; font-style: italic;">-- operations</span>
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> cozyroc<span style="color: #66cc66;">.</span>parallel_test <span style="color: #66cc66;">&#40;</span>
	log_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">IDENTITY</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
	group_table_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
	group_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
	table_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
	execution_dts datetime2<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">7</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> GETDATE<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">CONSTRAINT</span> <span style="color: #66cc66;">&#91;</span>PK_cozyroc_parallel_test_log_id<span style="color: #66cc66;">&#93;</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> CLUSTERED
<span style="color: #66cc66;">&#40;</span>
	<span style="color: #66cc66;">&#91;</span>log_id<span style="color: #66cc66;">&#93;</span> <span style="color: #993333; font-weight: bold;">ASC</span>
<span style="color: #66cc66;">&#41;</span><span style="color: #993333; font-weight: bold;">WITH</span> <span style="color: #66cc66;">&#40;</span>PAD_INDEX  <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span> STATISTICS_NORECOMPUTE  <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span> IGNORE_DUP_KEY <span style="color: #66cc66;">=</span> OFF<span style="color: #66cc66;">,</span>
ALLOW_ROW_LOCKS  <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">ON</span><span style="color: #66cc66;">,</span> ALLOW_PAGE_LOCKS  <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">ON</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ON</span> <span style="color: #66cc66;">&#91;</span><span style="color: #993333; font-weight: bold;">PRIMARY</span><span style="color: #66cc66;">&#93;</span>
<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ON</span> <span style="color: #66cc66;">&#91;</span><span style="color: #993333; font-weight: bold;">PRIMARY</span><span style="color: #66cc66;">&#93;</span></pre></div></div>

<p>&nbsp;</p>
<p>This will have created several tables:</p>
<ul>
<li>cozyroc.etl_groups (&#8220;HR&#8221;)</li>
<li>cozyroc.etl_tables (&#8220;employees,&#8221; &#8220;addresses,&#8221; etc.)</li>
<li>cozyroc.etl_group_tables (mapping &#8220;employees&#8221; to the &#8220;HR&#8221; group, for example)</li>
<li>cozyroc.parallel_test (our fake table we&#8217;re using to test the parallel loop)</li>
</ul>
<p>You don&#8217;t need all of this for the CozyRoc Parallel Loop Task to work, but I&#8217;m trying to just introduce some examples we&#8217;re going to use in later posts related to dynamic ETL.</p>
<p>Now let&#8217;s create a package</p>
<ol>
<li>Launch BIDS and create a new SSIS project</li>
<li>Create an OLEDB connection to the server and database where you created the tables (in my case, that&#8217;s a database called &#8220;cozyroc&#8221; on localhost)</li>
<li>For convenience, make sure you have the &#8220;variables&#8221; panel open (SSIS -&gt; Variables)</li>
<li>Drag a few things onto the workspace:
<ol>
<li>an Execute SQL Task</li>
<li>a ForEach Loop Container task</li>
<li>drag another Execute SQL Task into the ForEach Loop Container (make sure it&#8217;s placed inside of the container)</li>
</ol>
</li>
<li>Create a package-level variable called &#8220;mylistoftables&#8221; &#8211; make it of type &#8220;Object&#8221;</li>
<li>Click on the parallel loop task so it&#8217;s highlighted &#8211; now create a variable called &#8220;iter&#8221; and make it of type &#8220;int32.&#8221;  Clicking on the ForEach Loop and then creating the variable will scope &#8220;iter&#8221; to the loop &#8211; make sure the scope of the variable is correct.</li>
<li>Now let&#8217;s create another variable also scoped to the ForEachLoop.  Let&#8217;s call it &#8220;SQL_insert&#8221; and make it of type String.  We&#8217;re going to set this up to hold a sample insert statement so we can watch the loop in action.
<ol>
<li>For the &#8220;sql_insert&#8221; variable, set EvaluateAsExpression to &#8220;True&#8221;</li>
<li>Open the Expression and enter the following:
<pre>"insert into cozyroc.parallel_test (group_table_id, group_id, table_id)
select gt.group_table_id, gt.group_id, gt.table_id
from cozyroc.etl_group_tables gt
where gt.group_table_id = " + (DT_WSTR, 100) @[User::iter]</pre>
</li>
</ol>
</li>
<li>Connect the first SQL Task to the ForEach Loop</li>
</ol>
<p>Your package should now look something like&#8230;</p>
<div id="attachment_465" class="wp-caption alignnone" style="width: 725px"><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_loop_sql_basic_package1.png"><img class="alignnone size-full wp-image-471" title="ssis_loop_sql_basic_package" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_loop_sql_basic_package1.png" alt="" width="715" height="375" /></a><p class="wp-caption-text">SSIS ForEach Loop Package Setup</p></div>
<p>Now let&#8217;s set up the SQL Task</p>
<ol>
<li>Double-click the SQL Task to open the configuration</li>
<li>On the &#8220;General&#8221; panel,
<ol>
<li>Set the &#8220;Result Set&#8221; property to &#8220;Full Result Set&#8221;</li>
<li>Set the Connection property to point to your database (ex: &#8220;localhost&#8221;)</li>
<li>Set the SQLSourceType to &#8220;Direct Input&#8221;</li>
<li>Set the SQLStatement to &#8220;select * from cozyroc.etl_group_tables&#8221;</li>
<li><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_loop_sql_collection_statement.png"><img class="alignnone size-full wp-image-467" title="ssis_loop_sql_collection_statement" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_loop_sql_collection_statement.png" alt="ssis_loop_sql_collection_statement" width="494" height="96" /></a></li>
</ol>
</li>
<li>Now go to the &#8220;Result Set&#8221; panel</li>
<ol>
<li>Set &#8220;Variable Name&#8221; to point to your collection &#8220;User::mylistoftables&#8221; &#8211; this is where we&#8217;re going to store the results of the query</li>
<li><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_loop_sql_collection_result.png"><img class="alignnone size-full wp-image-468" title="ssis_loop_sql_collection_result" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_loop_sql_collection_result.png" alt="ssis_loop_sql_collection_result" width="400" height="57" /></a></li>
</ol>
</ol>
<p>What did we just do here?  We told the SQL Task to execute a query (&#8220;get me everything from the coyroc.etl_group_tables table&#8221;) and then store the results in our &#8220;mylistoftables&#8221; object.  Pretty straight forward.</p>
<p>Let&#8217;s proceed to setting up the loop</p>
<ol>
<li>Double-click on your ForEach Loop Container to open the properties panels</li>
<li>On the &#8220;Collection&#8221; panel
<ol>
<li>Set the &#8220;Enumerator&#8221; to ForEach ADO Enumerator</li>
<li>Set the ADO object source variable to your collection &#8220;User::mylistoftables&#8221;</li>
<li><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_loop_foreach_collection.png"><img class="alignnone size-full wp-image-469" title="ssis_loop_foreach_collection" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_loop_foreach_collection.png" alt="" width="518" height="363" /></a></li>
</ol>
</li>
<li>Now on the &#8220;Variable Mappings&#8221; panel
<ol>
<li>Set the variable to &#8220;User::iter&#8221; and the Index to &#8220;0&#8243; (first column of the result set)</li>
<li><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_loop_sql_variable_iter.png"><img class="alignnone size-full wp-image-470" title="ssis_loop_sql_variable_iter" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_loop_sql_variable_iter.png" alt="" width="362" height="56" /></a></li>
</ol>
</li>
</ol>
<p>In this step we told the ForEach Loop Container to loop through the &#8220;mylistoftables&#8221; collection &#8211; on the first table &#8211; and set the &#8220;User::iter&#8221; variable to the first column as it loops.  Keep in mind &#8211; and this is critical for later &#8211; <em>you&#8217;re using the User::iter variable scoped to the ForEach Loop Container</em>.</p>
<p>Alright &#8211; we&#8217;re almost done setting up the basic loop.  Now we just have to wire up a task that the loop executes.  In the SQL Task within your loop, set the loop to execute the &#8220;sql_insert&#8221; statement</p>
<ol>
<li>Double-click the second SQL Task to open the configuration</li>
<li>On the &#8220;General&#8221; panel,
<ol>
<li>Set the Connection property to point to your database (ex: &#8220;localhost&#8221;)</li>
<li>Set the SQLSourceType to &#8220;Variable&#8221;</li>
<li>Set the SQLStatement to your &#8220;User::sql_insert&#8221; variable</li>
</ol>
</li>
</ol>
<p>Run the package and, if there are no errors, pop over to SQL Server Management Studio for a minute.</p>
<p>Run a query to quickly look at the results of the package:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> cozyroc<span style="color: #66cc66;">.</span>parallel_test</pre></div></div>

<p>You should see something like&#8230;</p>
<pre>log_id	group_table_id	group_id	table_id	execution_dts
1	1	1	1	2011-11-04 13:15:00.9500000
2	2	1	2	2011-11-04 13:15:00.9800000
3	3	1	3	2011-11-04 13:15:01.0100000</pre>
<p>Note the dates and times. See how there are slight differences in the dates? The dates are clearly following a pattern where later group_table_ids have later dates? This is the result of the loop running <em>sequentially</em>.</p>
<h2>Converting to the CozyRoc Parallel Loop Task</h2>
<p>Let&#8217;s upgrade this to a Parallel Loop Task.  Hang on &#8211; things are about to get weird.</p>
<p>First things first.  Let&#8217;s quickly throw on some more components and variables as well as tweak some other bits.</p>
<ol>
<li>Drag a &#8220;Parallel Loop Task&#8221; onto the canvas</li>
<li>Delete the link from your first SQL Task to the ForEach Loop.  Where we&#8217;re going we don&#8217;t need that flow.</li>
<li>Now connect that first SQL Task to the Parallel Loop Task</li>
<li>Double-click on the Parallel Loop Task to open up the configuration panel
<ol>
<li>Click on the Package Connection property and set the connection to &#8220;&lt;New Connection&gt;.&#8221;  When the dialog box opens, make sure &#8220;Connection to Current Package&#8221; is checked and hit &#8220;OK.&#8221;  We just told the Parallel Loop Task to talk to this package when executing.  Right &#8211; this basically just became a &#8220;meta package&#8221; with execution steps.  Think of this like it&#8217;s own self-referencing parent-child package.</li>
<li><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_parallel_loop_new_connection.png"><img class="alignnone size-full wp-image-473" title="ssis_parallel_loop_new_connection" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_parallel_loop_new_connection.png" alt="" width="473" height="483" /></a></li>
<li>Now &#8211; still inside the configuration panel of the Parallel Loop Task &#8211; click on the ForEachLoop configuration item &#8211; a new popup should appear.  Click on the name of your ForEach Loop within this package.</li>
<li><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_parallel_loop_new_connection_3.png"><img class="alignnone size-full wp-image-474" title="ssis_parallel_loop_new_connection_3" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_parallel_loop_new_connection_3.png" alt="" width="372" height="252" /></a></li>
</ol>
</li>
<li>Your final Parallel Loop Task configuration should look something like this
<ol>
<li><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_parallel_loop_new_connection_2.png"><img class="alignnone size-full wp-image-475" title="ssis_parallel_loop_new_connection_2" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_parallel_loop_new_connection_2.png" alt="" width="396" height="138" /></a></li>
</ol>
</li>
</ol>
<p>Now we&#8217;re cooking.  Only a few simple changes left.</p>
<ol>
<li>Disable the main ForEach Loop.  We no longer manage it &#8211; the Parallel Loop Task does.  It enables/disables this as it fires each instance of this package.  If we left the loop enabled things would get very messy &#8211; you&#8217;d have sequential instances of the loop firing within each parallel instance of this package.  Loops in loops &#8211; very loopy.</li>
<li>Create a new <em>package-level</em> variable called &#8220;Iter&#8221; of type &#8220;int32.&#8221;   That&#8217;s right &#8211; we have a package and a loop variable now.</li>
</ol>
<p>That&#8217;s it.  That&#8217;s really all you have to do to take a sequential loop and turn it into a parallel loop. Your final package should look something like this:</p>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_parallel_loop_final.png"><img class="alignnone size-full wp-image-476" title="ssis_parallel_loop_final" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/ssis_parallel_loop_final.png" alt="" width="707" height="321" /></a></p>
<p>Give it a quick test run and then go back and re-run that query to look at the dates and times.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> cozyroc<span style="color: #66cc66;">.</span>parallel_test</pre></div></div>

<p>Your execution dates and times should now be much, much closer to each other if not completely identical:</p>
<pre>log_id	group_table_id	group_id	table_id	execution_dts
4	2	1	2	2011-11-04 13:36:06.3230000
5	3	1	3	2011-11-04 13:36:06.3300000
6	1	1	1	2011-11-04 13:36:06.3300000</pre>
<p>See that?  Granted, this is a fairly pointless test case, but you get the idea.  By default the Parallel Loop Task iteration setting is set to &#8220;-1&#8243; (as many cores as you have).  You may want to play with this (or better yet expose it as a runtime configuration property) depending on your situation.</p>
<p>Next up we&#8217;re going to step into how the CozyRoc DFT+ component can make your life easier by side-stepping SSIS&#8217;s age-old static design-time column mapping problem. Combined with the Parallel Loop Task, that&#8217;s when things really start to get interesting.</p>
<p>Further reading:</p>
<ul>
<li>SSIS For Loop Container (<a href="http://msdn.microsoft.com/en-us/library/ms139956.aspx">http://msdn.microsoft.com/en-us/library/ms139956.aspx</a>)</li>
<li>SQL Server Integration Services &#8211; SSIS &#8211; For Loop Container Samples (<a href="http://www.sqlis.com/post/For-Loop-Container-Samples.aspx">http://www.sqlis.com/post/For-Loop-Container-Samples.aspx</a>)</li>
<li>CozyRoc&#8217;s Parallel Loop Task (<a href="http://www.cozyroc.com/ssis/parallel-loop-task">http://www.cozyroc.com/ssis/parallel-loop-task</a>)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/edw/2011/11/etl-assistant-using-cozyrocs-parallel-loop-task/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Head starts are good things.</title>
		<link>http://informatics.northwestern.edu/blog/cid/2011/10/head-starts-are-good-things/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=head-starts-are-good-things</link>
		<comments>http://informatics.northwestern.edu/blog/cid/2011/10/head-starts-are-good-things/#comments</comments>
		<pubDate>Fri, 14 Oct 2011 20:13:58 +0000</pubDate>
		<dc:creator>Justin Starren</dc:creator>
				<category><![CDATA[CID Chief Info-dude]]></category>
		<category><![CDATA[HIT Policy]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=447</guid>
		<description><![CDATA[According to a story in Healthcare IT News,  a study conducted on behalf of the Center for Public Integrity found that roughly half of the first batch of HITECH payments went to doctors and hospitals that had implemented EHRs before &#8230; <a href="http://informatics.northwestern.edu/blog/cid/2011/10/head-starts-are-good-things/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>According to a story in <a href=" http://www.healthcareitnews.com/blog/analysis-finds-hit-veterans-receiving-hitech-incentives">Healthcare IT News</a>,  a study conducted on behalf of the Center for Public Integrity found that roughly half of the first batch of HITECH payments went to doctors and hospitals that had implemented EHRs before the incentives were announced.</p>
<p>Reportedly the spoksman for one Congressman argued “if providers have been paid for systems they already had in place, that seems to be an inexcusable waste of taxpayer dollars. It makes no sense for HHS to pay physicians for systems they already have.”</p>
<p>This statement seems to ignore several obvious facts.  The first is that EHRs cost money.  If institutions that had already invested received no payments, but those who waited did receive payments, the effect would be to give a competitive advantage to the technology laggards.  It would create a clear message for providers to never invest their own money in IT.  It would effective say, &#8220;wait long enough and the government will do it for you.&#8221;  That is not a message we want to convey.</p>
<p>Second, to receive payments, one must not only install EHRs but to use them in a &#8220;meaningful&#8221; way.  It is no surprise that organizations with a large head start reached meaningful use first.  If organizations that had invested years of effort and millions of dollars did not reach meaningful use before those starting from scratch, we would need to seriously question the criteria.</p>
<p>Finally,  HITECH is not about whether providers buy EHRs, but whether they really <em>use them.  </em> This is only stage 1.  Stages 2 and 3 are significantly harder.  Stages 2 and 3 require significant improvement and expansions of the ways that EHRs are used, even for the established leaders. Even the HIT leaders are investing millions of dollars to reach Meaningful Use.  I know of one national leader in HIT, which already had a paperless EHR, that estimated it would roughly break even on Meaningful Use.</p>
<p>It is not reasonable or appropriate to evaluate the entire program based on the who crosses the Stage 1 finish line first.</p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/cid/2011/10/head-starts-are-good-things/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

