My new project involves getting data from 500+ remote sites into a target database; in parallel, quickly. Thanks to some proof of concept work done by my coworker Mike “The Devastator” Hurd (I made up the nickname), I got to learn SSIS, and I’m cursing that I had not picked it up earlier.
What Is SSIS?
- Imagine you wanted to write a generic program to get data from point A to point B, but modify it along the way.
- You wanted pluggable components that could talk to each other.
- You wanted anything that could be done in parallel to be done in parallel. Ie, read from two tables at the same time if you have bandwidth.
- You want it to work buffered rather than row by row. But dynamically figure out buffer sizes so that they fit well to the page size on the computer (to minimize swapping)
- You want a GUI that you can configure these components with.
- You want every component to have a configurable “if things fail” state, to redirect failed rows to other components, etc.
- You want it to be solid – no leaking memory, etc
- You want to parameterize the stuff getting into the components
- You want it to be runnable from a command line, with ability to override parameters and connection strings.
- You want the whole configuration to be saveable as an XML file, or hosted in a SQL Server
- You want to be able to schedule it to run from SQLAGent (SQL Server’s “Cron” thing)
- You want to be able to run it directly from C# via managed code.
Then you would write SSIS.
What Does It Look Like?
Initial Gotchas
SSIS requires some level of licensing, so this does NOT work:
- Install SQLEXPRESS
- install Visual Studio 2012 Premium
- Install SSIS add-on
- Install SSDT (“database tooling for VS2012”)
It successfully loads, and you can debug packages within Visual Studio, but you cannot run them outside via dtexec and dtexecui (the command line thingies for SSIS). Instead, you need to add:
- Install SQL Server Developer Edition (which includes SSIS) (or standard edition)
- Select Integration Services & Data Tools
- Do NOT have to install the engine.
- Might have to install an updated SSDT to get VS2012 support
Then everything runs fine.
The other thing I quickly learned is it is very Schema-dependent. The reason for this is it allocates row counts into buffer sizes based on the size of the rows – so a Varchar(MAX) field is very different from a Varchar(250) field – and the schema will fail to validate if the underlying data source it is reading from does not match perfectly.
The solution I had to apply was to do a select cast(x as varchar(250)) as x to extract the data to guarantee the schema coming in to SSIS.
Conclusion
Very glad to have this tool in my arsenal. Thanks Mike!