Biography:

Location: Dayton, OH
College: Cedarville University, Cedarville, OH
Degree: B.A. Management Information Systems (2002)
Certifications: Microsoft Certified Professional (MCP) 2002

Interests:

Music, Playing Guitar, Working on Cars, Home Improvement, Photography, Computers, Website Programming, Driving in the snow, NHL Hockey, Sand Volleyball, Settlers of Catan board game

Favorites:

Foods: Lasagna, Carrot Casserole, Chicken Parmesan
Sports: Hockey, Football, Volleyball, Soccer
Desserts: Peanut Butter Pie, Peanut Butter Passion ice cream

Employment Information:

Current Resume: Download PDF Resume


Contact Me:
Use this area to send me a note or message:
Name:
Email:
Comments:
Security Code:

Fun links you should probably check out:


Fun games you might want to play:


Current Code Samples:


Future Code Samples:

  • LINQ-to-XML Dynamic Where Clauses Using Ternary Operators
  • C# List delegate alphanumeric sort
  • Using HTML form input arrays with ColdFusion 7
  • Capture selected text or cursor position in a TEXTAREA with JavaScript
  • Show/Hide all SELECT input items on a page with JavaScript
  • Show/Hide element with opacity level fading using JavaScript
  • CAPTCHA form-protection of images using PHP and GD 2.0 library
  • Resizing images using PHP and the GD 2.0 library
  • Rotating images using PHP and the GD 2.0 library
  • Cropping images using JavaScript, PHP, and the GD 2.0 library
  • Add image watermarks using PHP and the GD 2.0 library
CODE TUTORIAL: SSIS Excel Source, Columns with more than 255 characters, DTS_E_CANTGETBLOBDATA
Recently, I've been working more with SSIS Packages in SQL Server 2005, this time doing some Data Flow Transformations from Excel template files into SQL Server backend.

Today I want to talk about (3) important gotchas when using Excel Source with SSIS data transformations in a Data Flow Task.



Part 1: Mixed Datatypes In Excel Column

One of the more annoying "features" (if you can call it that) of Excel is in how it handles mixed-datatypes within a single column. Say I have an Excel file with a column of data that looks like the following:



Or maybe even I've applied an Excel format to the column to force decimal positions or zero-filling:


When Excel saves this document, the metadata that it will write for this column will attempt to scan the file and determine the data type. SSIS Excel Source also performs this same scan of the metadata of the Excel file when you set up an Excel Connection Manager. The frustrating part, is that Excel will make determinations on datatype FOR you, without allowing you to choose or force that you want to treat everything in the Excel file as straight TEXT. A workaround I found for this issue was to modify the ConnectionString to the following:

ConnectionStrings.com Excel 2000

Original OLE DB ConnectionString:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyExcel.xls;Extended Properties="Excel 8.0;HDR=Yes";
New OLE DB ConnectionString:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyExcel.xls;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1";

"HDR=Yes" - This indicates that the first row of each worksheet in this Excel file contains column names or header text. "HDR=No" would indicate that the first row of each worksheet in this Excel file contains data, not header text.

"IMEX=1" - This tells the OLE DB driver to read mixed datatype columns as text instead of trying to use what Excel had embedded into the metadata for this file. This is the setting that solves the issue with mixed-datatype columns in an Excel Source in SSIS!

VERY IMPORTANT!
If you end up using IMEX=1 in your Excel Connection Manager connection string, you MUST ensure that the Excel file you are importing has had all columns with mixed-datatypes formatted to "Text" in Microsoft Excel. Trying to use IMEX=1 on a mixed-datatype column that has individual cells formatted to other formats (like float, double, int, datetime, longtext, special, currency) will fail with a very cryptic SSIS message. The solution is to open the XLS file in Microsoft Excel, and select all of the columns that have mixed-datatypes in them, formatting them to be "Text" instead of their current format. (This frequently will happen when there are copied and pasted cells from other Excel files that end up in the Excel file you are trying to import)

WRONG:

RIGHT:

(green triangles in the top left corner of cells that are all numeric are your best friend)


Part 2: Columns With More Than 255 Characters


By default, when you add an Excel Source to your Data Flow Task in SSIS, every column from the Excel source document will be listed with the following properties:


As you can see, every column is set to SSIS's DT_WSTR datatype, and given a fixed length of 255 characters. You certainly can see how this will become frustrating when your source data exceeds 255 characters in length, as SSIS will throw errors and validation warnings about mismatches between the metadata in Excel and what SSIS interprets.

The best way to solve this issue is to manually force SSIS to use a different datatype (DT_NTEXT) for any columns in your Excel Source that you know will contain more than 255 characters. Here is a screenshot illustrating what that will look like:


Some folks might think that you can just change the data length of the DT_WSTR field to a higher number, like 2000 or 4000 characters. DT_WSTR is limited to 4000 characters total, and if you use any Expressions on that column that manipulate the string or substring it within SSIS later on in the Data Flow, you will get all kinds of "possible truncated value" warnings that will frustrate you. Your best bet is to convert those fields to DT_NTEXT datatype for both the "External Columns" and "Output Columns" within the Excel Source Advanced Editor.


Part 3: Columns With No Data In First 8 Rows Cause DTS_E_CANTGETBLOBDATA Error

This one is most frustrating and took the most searching to find the answer. After performing the fix mentioned in Step 2 to my own Excel document (one in which I had 2 columns in my file that could be several thousand characters) I noticed that when the first 8 rows of the file did not EXCEED 255 characters, that SSIS would throw the following cryptic error message:

"Failed to retrieve long data for column [column name]"

This one took some searching, but I discovered that the only fix here is a local registry hack to change the default way Microsoft Jet 4.0 scans data in an Excel file. What I discovered with my 8 rows estimate was right on with the default configuration of Microsoft Jet. There is a registry value "TypeGuessRows" that Microsoft Jet uses when opening an Excel file where it scans the first 8 rows of a file to try to determine what type of data exists within each column in an Excel file. This was reported under a Microsoft KB article: Q189897


Part 4: Create an Excel Template that has hidden 2nd row

The last part to solving this issue was creating a 2nd row in my Excel file worksheet that had bogus data in it. Paying close attention to the columns in my Excel file that could potentially have more than 255 characters of text content, I made sure to add bogus text of at least 256 characters into those fields for row 2. I then right-clicked on the "2" on the left side of the row and was able to "hide" the row from view (the data still comes over in the Excel Data Flow). My worksheet now has a Row 1 and then a Row 3 with no Row 2.

The other trick to this issue is that you MUST completely delete your Excel Source in your Data Flow, so that your "mixed" columns will have the correct datatype on them. You can manually try to select NTEXT datatype for the "External Column" and "Output Column", but until you delete and re-create your Excel Source, your changes will not work as desired. One of the frustrating things about BDIS or Visual Studio when editing Excel Data Flow Tasks.

I then saved the Excel file and use that as my input file for my SSIS package. It worked perfectly, columns with no text data in them came across as DT_WSTR datatypes (even if they were blank) and that same column could also now contain text that was longer than 255 characters and have that text come across into my Data Flow in SSIS!

This was the best of both worlds and seems to be a stinky solution, but one that finally works! Hopefully this will help you prevent some head scratching in the future...

- Justin Tubbs