Having Fun with SSIS

My new project involves getting data from 500+ remote sites into a target database; in parallel, quickly.  Thanks to some proof of concept work done by my coworker Mike “The Devastator” Hurd (I made up the nickname), I got to learn SSIS, and I’m cursing that I had not picked it up earlier. 

What Is SSIS?

  • Imagine you wanted to write a generic program to get data from point A to point B, but modify it along the way.
  • You wanted pluggable components that could talk to each other.
  • You wanted anything that could be done in parallel to be done in parallel.  Ie, read from two tables at the same time if you have bandwidth.
  • You want it to work buffered rather than row by row.  But dynamically figure out buffer sizes so that they fit well to the page size on the computer (to minimize swapping)
  • You want a GUI that you can configure these components with.
  • You want every component to have a configurable “if things fail” state, to redirect failed rows to other components, etc.
  • You want it to be solid – no leaking memory, etc
  • You want to parameterize the stuff getting into the components
  • You want it to be runnable from a command line, with ability to override parameters and connection strings.
  • You want the whole configuration to be saveable as an XML file, or hosted in a SQL Server
  • You want to be able to schedule it to run from SQLAGent (SQL Server’s “Cron” thing)
  • You want to be able to run it directly from C# via managed code.

Then you would write SSIS.

What Does It Look Like?

image

image

image

image

Initial Gotchas

SSIS requires some level of licensing, so this does NOT work:

  • Install SQLEXPRESS
  • install Visual Studio 2012 Premium
  • Install SSIS add-on
  • Install SSDT (“database tooling for VS2012”)

It successfully loads, and you can debug packages within Visual Studio, but you cannot run them outside via dtexec and dtexecui (the command line thingies for SSIS).  Instead, you need to add:

  • Install SQL Server Developer Edition (which includes SSIS)  (or standard edition)
    • Select Integration Services & Data Tools
    • Do NOT have to install the engine.
    • Might have to install an updated SSDT to get VS2012 support

Then everything runs fine.

The other thing I quickly learned is it is very Schema-dependent.  The reason for this is it allocates row counts into buffer sizes based on the size of the rows – so a Varchar(MAX) field is very different from a Varchar(250) field – and the schema will fail to validate if the underlying data source it is reading from does not match perfectly.

The solution I had to apply was to do a select cast(x as varchar(250)) as x to extract the data to guarantee the schema coming in to SSIS.  

Conclusion

Very glad to have this tool in my arsenal.   Thanks Mike!

Tagged with: , ,
Posted in Code

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories
Tags
.net 3d 3d-printing 4k abc15 algorithms ames android anonymous types asp.net audio editing aws backup basecamp beatunes biorhythm bitcoin bittorrent blender blog boston marathon bpm c# caffeine campfire candycrush car carmax charity chiropractor cities-skylines clog clone codelouisville codepalousa coding coffee collaboration color run ComputerElbow ComputerVision configuration consulting cooking crash course crashplan crestwood cycling dabda dan dapper DataSet ddl diabetes dictation dotnetcore dotnetmud downtown e-cycling elite excel exercise expiration facebook feature-branching firefall flipflops Flow FL Studio focus food forecastle fortresscraft franklinplanner gadgets game-design games git github google docs google maps gopro gps grandpa greenshot hack half marathon headless health heart rate hiren ignew integration testing interop inventory ios ipad itunes javascript jobs karma kdf keyboards keys kittens lamont laptop lavalamp lego life lifehack linq linqtotwitter linux los angeles louisville mandelbulber massage therapy mastery-teaching maths merge metformin Minecraft miniature modeling monitor mud muhammad ali institute music mvc mycartracks netfabb nexus10 node nostalgia nutrition nwipe oldham county grand slam opal openjscad openscad owin pacedj paper mockup pepakura performance photoscan politics pomodoro postgresql powershell premiere prius process product-management project-management qa ReFS resharper review rmi roman road 5k RSI rubiks running samsung 700t sandals schedule scooter scribblelive selenium service shapeways sleep slic3r sneakersync snot software software-engineering solidoodle soylent spacegame speaking sql sqlite SSDT SSIS standing state-machine stayfocusd stonehearth sunset tablet teaching team teamcity teamtreehouse terraform tesla testing tfs time timelapse torque touch tracks trs80 Tuple tutor twitter ubuntu unit testing utilities video video editing visual studio vscode vsvim warp stabilizer windows 8 windows home server wordpress wpf xml
Archives
%d bloggers like this: