A Day at the Office: Sql Loops, Dapper Query Multiple

My wife pulled me out of the office today, to take her to the State Faire.  However, right now, she’s meeting with her coffee girls.  So: I can put some stuff that’s been brewing in my mind into a blog post.

Working with somebody I had not worked with before, I am learning a lot of interesting things:

SQL Loops

We’re working on getting data from hundreds of sites, pooling it, aggregating useful information, and then reporting it.  The size of the pooled data is around 25 million rows, and will grow by several million a year.  

When we change how the aggregation works (new math, new metrics), we have to delete all the aggregated data and re-aggregate.    Due to foreign keys and such, truncating the tables is not an option (from a sproc, at least).  A straight delete took forever and gave us no feedback.

My coworker used this construct to provide a way to measure delete progress:

Version 1:

while exists(select top 1 TrooperId from Trooper where TrooperId > @TrooperID)
begin
  select @TrooperID = Min(TrooperId) from Trooper where TrooperId > @TrooperID
  DELETE FROM [aggregated].[Statistics] where TrooperId = @TrooperID
end

I have never done a while statement in SQL before.  Very elegant.  It got around most of the problem of trying to delete too much without any feedback. 

Even better, the next time I looked at the source, I found something else:

Version 2:

WHILE (1=1)
BEGIN
    DELETE TOP ( 10000 ) FROM [aggregated].[Statistics]
    IF (@@ROWCOUNT = 0)
         BREAK
    --WAITFOR DELAY '00:00:02'
END

Even simpler!

A Case for Simpler Dependency Injection

I’m used to code like this:

private readonly DbConnectionProvider _dbConnectionProvider;
private readonly ITrooperGetQuery _TrooperGetQuery;

public TrooperAccessGetQuery(DbConnectionProvider dbConnectionProvider, ITrooperGetQuery TrooperGetQuery)
{
	_dbConnectionProvider = dbConnectionProvider;
	_TrooperGetQuery = TrooperGetQuery;
}

Any dependencies your code has on other components, get “injected” into your constructor.   Usually used along with a Dependency Injection library; to call this code, you would do something like: 

var query = IoC.Get<ITrooperGetAccessQuery>();

Or, alternately, would in turn take a IITrooperAccessGetQuery parameter in their constructor, and rely on the person calling them to make the decision…

.. bubbling up all the way till you get to some magical code in the app that ties everything together.  Somewhere else.   This has always annoyed me.  

With this approach, seems to be that a) constructors get really large and unwieldy, and b) “what’s going on” is spread out between where DI is set up and where it is consumed(*)

(*) Some would argue that’s the whole point of DI. 

Anyhow, I found this style code which needed access to the TrooperAccessGetQuery:

private ITrooperAccessGetQuery _tagq;
private ITrooperAccessGetQuery TrooperAccessGetQuery
{ get { return _tagq ?? (_tagq = new TrooperAccessGetQuery()); } }

What this says: “Hey, my system is simple, I don’t need a master system putting everything together. My components know what components they normally depend on, but you have a chance to override. If you call the default constructor, I’ll give you a component with all default components.”

Thus, I added a default constructor to my component:

public TrooperAccessGetQuery() 
    : this(new DbConnectionProvider(), new TrooperGetQuery()) { }

And life proceeded.   Its pretty, and avoids ugly constructors, and arcane app bootstrap code, and having to choose a dependency injector. 

Dapper Query Multiple

I had not seen this construct before.   Useful:

public static Counts Get()
{
	const string sql = @"
select  count_big(1)
from    imported.TrooperStatistics with(nolock)
where   is_summarized=0

select  count_big(1)
from    imported.TrooperStatistics with(nolock)

select  count_big(1)
from    aggregated.StarDateHash with(nolock)
where   is_aggregated_total=0

select  count_big(1)
from    aggregated.StarDateHash with(nolock)
where   is_aggregated_time_interval=0";

	var connectionProvider = ObjectFactory.GetInstance<DbConnectionProvider>();

	using (var connection = connectionProvider.GetOpenConnection())
	using (var multiple = connection.QueryMultiple(sql))
	{
		return
			new Counts
			{
				Total = multiple.Read<long>().First(),
				NotSummarized = multiple.Read<long>().First(),
				NotTotaled = multiple.Read<long>().First(),
				NotCalculatedTimeInterval = multiple.Read<long>().First(),
			};
	}
}

I learn something new every week. Thanks, JM!</P?

Primer on SSDT / Checking Schema into Source Control

I’ve been learning a lot of new things with this client – SSIS, SSRS, and now we’ve also implemented a workflow using SSDT.   In short:  This is a well supported way to check your database schema and seed data into source control.  So, looking back, what are the resources that I wish I had known going in?

Installation

imageNear as I can tell, SSDT is installed if/when you install SQL Server Express.   There’s a basic version also installed if you install VS2012 Premium and up, supposedly.

It can also be installed directly via http://msdn.microsoft.com/en-us/jj650015

Once installed, you’ll be able to do Schema Compares, Data Compares (useless to me?), and database projects (*.sqlproj).   As a lot of people don’t seem to have it installed, I usually put it in a different solution than the main project solution.

Capturing an Existing Database into a Database Project

  • Create database project; If you choose to go blank at first then..
  • Right click solution, compare schema, SWITCH DIRECTIONS, Update Target
  • Hint: Don’t save any user objects, not till you are very comfortable with it.

image

image

As of the current version, I no longer have to have one database project, and a different server project (as I did when I looked at it in 2010 or so).

Making Changes

Can either make them in the database and then suck the changes in, or can edit them directly in visual studio – either graphically or in SQL.

image

A build action will verify every reference – for example, sproc referencing columns, etc.   There’s a bit of a confusion around getting casing incorrect that you can turn off.

When you check the changes into source control, they are saved as the T-SQL version.  Very self-documenting, diffable, etc.

Applying Changes to Local Database Interactive

Use schema compare, select/deselect what you want to update, and hit Update.   

This does NOT run pre- and post- deployment scripts.

Applying Database Changes Automated / Without Visual Studio

Compile the project, get a .DACPAC file.   Then use SQLPACKAGE.EXE, located usually at C:\Program Files\Microsoft SQL Server\110\dac\bin\sqlpackage.exe.  Newlines added for readability.

“If I publish it, what will it do?”

  • /action:DeployReport
  • /sourceFile:Flintstone.Database.dacpac
  • /TargetServerName:BEDROCKCITY
  • /TargetDatabasename:FlintStone
  • /OutputPath:differences.xml
  • /p:DropObjectsNotInSource=false

“What is absolutely everthing that is different between the two?”

  • /action:DeployReport
  • /sourceFile:Flintstone.Database.dacpac
  • /TargetServerName:BEDROCKCITY
  • /TargetDatabasename:FlintStone
  • /OutputPath:differences.xml
  • /p:DropObjectsNotInSource=true

“Do it. (but don’t delete those users!)”

  • /action:Publish
  • /sourceFile:Flintstone.Database.dacpac
  • /TargetServerName:BEDROCKCITY
  • /TargetDatabasename:FlintStone
  • /p:DropObjectsNotInSource=false
  • /p:BlockOnPossibleDataLoss=false

It runs it with pretty colors in a console app.

With Sql Server Management Studio, caveats

In SSMS, under database, tasks, there is “Update Database-Tier Application”.  This does effectively the same thing, except you don’t get to choose the options above, so it usually does too much.  However, you can save off a change script and then walk the change script by hand.

Handing Data Changes

This guy wrote an excellent blog article about it:  http://schottsql.blogspot.com/2012/11/ssdt-pre-and-post-deploy-scripts.html.  I extended it a little bit:

  • Created a ScriptHistory table with a script name (PK) and when it was run, to avoid rerunning
  • This is for the one time refactors that move data from an old location to a new location (or going from inferred settings to explicit settings), but cannot be run again, but there is no way to detect if it’s a rerun or not.

image

image

Best Practice for Security / Users / etc

I have not had a chance to do this yet, but the general idea is to create roles for permissions, have SSDT deploy the roles, but the user membership is unique per server and is handled manually.  (We’re talking about situations like I am developing in COMPANY1’s domain, but the production server is in COMPANY2’s domain and Windows Auth is being used.)

In Conclusion

I watched my dba-idol Richard Svatos struggle with this in 2003.

I’ve tried to solve it in a variety of ways

I’m glad there is a usable, well supported solution now in the ecosystem which I am flourishing.

Ubuntu, Headless Mediasmart Server, Software Raid, Samba

I am now several days into this experiment. Its not working quite as I had hoped, but it is working.  So here’s a roadmap/dump, with links along the way:

imageimage

Headless Server Install

I had a HP Mediasmart EX-485 Home Server that went unresponsive. Did not appear to be a drive failure, the O/S started hanging.    After rescuing everything, I had this pile of 4 hard drives sitting around – 750G, 1TB, 1.5TB, 2TB – and I was wondering what I could do with them.   Well, build a server of course!   Hey.. I have this hardware sitting around.. it used to be a Mediasmart EX-485 Home Server…   But it doesn’t have a display, or keyboard, or mouse.   There are places you can order some add-on hardware to do this, but it would cost me money.

I researched a couple of options, the winner was:  to hook up the hard drive (I chose 750G) to another computer, install Ubuntu on it, and then additionally install sshd, and only then then transfer it to the mediasmart chassis.  Luckily, most driver support is built in to the linux kernel, so switching hardware around is not a problem.  

Then I downlod Putty on my windows desktop machine, and use it to connect to the server (formerly named MGZWHS, now named diskarray).

Adding in Drives and Making a Pool

I booted the server up, and checked what drives were available (sudo lsblk), and it only showed me /dev/sda and its partitions.   As an experiment, with the server up and running, I hot-plugged in another drive, and looked again.. sure enough, there was a /dev/sdb present now.

I plugged in all the drives, then went and did some research.  This lead me to LVM (logical volume manager), and there were a ton of examples out there, but all of them seemed to use hard drives of identical sizes. 

At first, I thought I needed to repartition the drives like this guy did – so that I could pair up equal sized partitions, and then stripe across the partitions – But once I got into the fun of it, it became much simpler.

  • create PV’s for each disk
  • create 1 VG covering all the PV’s
  • create 1 LV with –m 1 (mirror level 1) on the VG.  This command went crazy and did a bunch of stuff in selecting the PE’s to use for the different legs, and the mirror log …
  • create an ext4 fs file system on the LV

The –m 1 “peeks” into the physical volumes and ensures that any piece of data is backed up to 2 separate physical volumes – and as my physical volumes are all disks, they’re on different disks.

Surprise, though – it took about 2-3 days for the mirroring to catch up between the drives.   Screen dumps available here:   http://unix.stackexchange.com/questions/147982/lvm-is-this-mirrored-is-copy-this-slow

imageCreating a Samba Share

I then approximately followed these directions to create a samba share.  However, its not quite right yet – the permissions of created files / the permissions of other files, its not quite matching.   However, on my windows machine I can definitely see the files on my server, and its fast.

NB: You can see the .sync files and the lost+found of ext3fs

Syncing Via BtSync  // stuck

I then followed these directions to instal btsync, and then attempted to sync my 1.2TB of data from my windows box to diskarray.  It got part of the way there, 385MB .. creating all the files as user “diskarray” (herein lies my samba file sharing problem) – however, its gotten stuck.   Windows btsync knows it needs to send 1.1G of stuff to diskarray .. they both come online .. and nothing gets sent.     There are ways to debug this – I know how to turn on debug in windows, but have not yet followed the directions for linux – and eventually, I hope to fix it.

// todo: Simulating Drive Failures

I did attempt a drive failure earlier, but that was before the copy was 100% done – so, it was, shall we say, a little hard to recover from.  Later on, I plan on taking one of those drives out while its running and see how I can recover.   Maybe even upgrade the 1G to a 2G, if I ever take apart the external hard drive that I used to have on the TIVO.   What should happen is the lv would degrade from “mirrored” to “normal”, and the unmirrored partitions would become unused.  We shall see.

Headless Install Ubuntu on Mediasmart Server

I’m trying an experiment:

  • Connect up a 750G drive via my SATA to USB to my machine
  • Use VMWare Player
  • Create a VM
  • attach the VM to this 750G USB device
  • install Ubuntu Desktop from ISO (Desktop = 32bit, target is a 32bit server with 2G ram)
  • When done, transplant drive into headless server
  • Hope it boots up and gets to the network so I can ssh into it without a display or keyboard.

According to my research, there is a gotcha step, where I tell it that “every time I boot, rewrite what is eth0 and what is eth1, etc”.  See the docs here if needed:  https://www.google.com/search?q=70-persistent-net.rules  (I have not needed to do this so far)

image

Problem Number 1.  When I reboot:

image

Then it got worse.  It fails to recognize the USB-connected drive as a valid boot source, and I get “Operating System Not Found”.  

Ah well!  Lets try it anyway.  Disconnect the drive, stick it in the Mediasmart server in the first spot, and power on ..

  • It sure sounds like its booting.  Lots of spinning going on with that drive.
  • I can now ping it! 
    • image
  • I can confirm that the lights on the back of the machine coincide with ping packets!

Time to download putty and either telnet or ssh into my new diskarray computer!

Problem: neither telnet nor ssh work – I guess the servers are not up and running.

The next stop would be to put the drive on some hardware that can successfully boot it, install telnetd and ssh, and then move it back into the headless server.    I only have one machine that can do that at home, and that’s my main computer.. that I need for work here in a bit, so its not happening tonight. 

However, maybe I could look into setting up Ubuntu on a bootable USB flash drive?

Figuring out a Sync strategy

At first I was going to draw this all out – I did, with VISIO, but the drawing isn’t quite right pretty enough.  It does reflect the solution I’m going towards. 

image

I have the following computers, with the following uses:

  • “Big” Computer – my main workhorse desktop at home.  It does a lot of Video Editing, Audio Mixing, etc.
    • Currently has USB 3TB drive #1 mounted
  • Surface #1 – my most portable computer.  Usually just edits things in the cloud, occasionally gets an offload of an SD card while I’m on the road.
  • Laptop #1 – My “I would program on this laptop” laptop, which I mostly use for Remote Desktop into my work computer
  • Laptop #2 – this is the laptop that controls the 3D printer.  It has a lot of 3D printer tools on it.
  • Work Computer – for work stuff.   Not my computer
    • Currently has USB 3TB drive #2 mounted

I pretty much use the pattern of:

  • C:\2014\<project Name>\  is where I put all files for a project.
  • I keep the same project name between computers if I move the project around.  For example, when I was doing book design for Dad’s book, that started on Laptop #1, then moved to “Big” Computer
  • I consider “iTunes” to be a project, as well as my local instance of my Dropbox folder.  Unfortunately, these are still in 2013. 

My needs:

  • When I’m working in C:\2014, it needs to be fast and stable.
  • When I’m working in C:\2014, it eventually needs to get backed up to the big backup
  • Not all C:\2014 projects should be on all computers.  In fact, almost NONE of them should be on the Surface, it only has a smallish SSD.  Same deal with Laptop#2, that has 10G free or so after the O/S.
  • The Big Backup should be offsite-level backed up.

Limitations

  • A computer cannot btsync with itself (yet)
  • A R/O folder (like the phone backup destination) cannot be inside a R/W folder.

Options

I thought about something like this:

  • BigComputer 3TB \BtSync1  syncs to Work 3TB \BtSync1
  • I create a sub-sync of WORK:\BtSync1\users\sunny\2014 to BigComputer:c:\2014

It would work, but it would be a bit ugly.  Lots of hops going offsite and then back home to get things backed up.

Winner?

I believe I’ve decided on the following:

  • The large 3TB drives maintain their own sync pool.  
  • The local folders (C:\2014) MIGHT maintain their own sync pool, on a per-project basis.   For example: 3dModels between my big computer and the 3d printer laptop.
  • Every project should end up at Sunny’s Big Computer
  • I’ll use Robocopy on a schedule to bridge the final gap from Sunny’s big computer c:\2014 to USB:\BTSync1\users\Sunny\2014
  • When I bring another machine online at home which can hold large drives, I’ll add it to the big sync pool as well to have a local (faster) backup. (Offsite can lag behind by days when we’re talking video files)

For things like backing up Mom’s important stuff, I’d probably create a USB:\BTSyncMom folder and have that be its own pool, not make it sub- or super- to any other sync pool.   Or, continue to use Crashplan there.

Something I Don’t Yet Have

Windows Home Server gave me the ability to restore a machine to a specific state, covering all the installed software on a machine.  I could do that using a copy of Ghost, or some other Hiren tools.    I don’t yet have a plan for that.  On the other hand .. reformatting a machine these days is “no big deal” for me, with all the data being out in the cloud somehow anyway. 

Fun with Bittorrent Sync

BitTorrent Sync can be downloaded here:  http://www.bittorrent.com/sync

Large Folders

I now have two external hard drives, one at home, one at work, synced with each other:

imageimage

  • I first xcopy’ed one external drive to the other.   This took about 20 arns
  • I then hooked one drive up, and pointed BitTorrent at it.  It started indexing.
  • I took the other drive to work, pointed Bittorrent at it there with the same shared folder secret, and it too started indexing.
  • The two instances of BitTorrent spent quite some time chatting with each other over the network:  “I have this file!”  “I do too!”  .. getting to the point where they agreed that they were in sync with each other.   This chatting and indexing phase took perhaps 2 days.
  • They are now in sync.  They still “chat” at each other every now and then, but the network traffic is minimal. (I think)

Robustness

I then played with the robustness of the sync.  First, I renamed a folder on one –  did it transmit it as a rename?  Why, yes it does:

image

I then turned BitTorrent Sync off on one of the sides (but not the other), and did some more renaming.   I got a mixed bag of results:

image

What it looks like to me is, if BtSync is listening to a folder, it remembers actions taken and can catch up other clients to those actions; however if BtSync is turned off, it takes its new indexing data as “new stuff that happened” (ie, it doesn’t know there was a rename) and thus deletes and adds files as necessary.    In the end, the two repositories are in sync, and the deleted files are copied over to the .SyncArchive folder.

image image

iPhone Backup

imageI turned on the “synch my photos from my iPhone” feature.  It creates a bitTorrent sync source, with only a readonly secret for others to consume.  

image

  • If I delete a photo on my computer, it is NOT resynced from the phone.   
  • If I delete a photo on my phone, it is NOT deleted from the computer
  • The files are not date/time stamped like they are with the Dropbox export.
  • It only synchronizes when I open the BitTorrent Sync app on the phone.  There is no option for background refresh.  (+1)

So far so good, I like it.

Nested Folders

The idea is this:

  • D:\BtSync1 is synched from A to B
  • On the same computer, can I additionally sync D:\BtSync1\Users\sunny\2014\3dmodels from A to computer C as C:\2014\3dmodels?

This way, I can keep specific projects (in this case, 3d printing stuff) synched between two computers, while having the data synced to the offsite backup as well?

Answer: Yes, as long as the parent (D:\BtSync1) and Child (+Users\Sunny\2014\3dmodels) are both Read-Write secrets.    Ie, I could not place my phone backup folder in BtSync1, but I can do what I want above.

Testing it out:

  • Deleted file “LaundryBasketHandleV1.stl” on A (not shown by name in the log file below). 
    • Deleted on B, and C
  • Deleted file “LaundryBasketHandleV1_fixed.stl” on C (“MOLLY-L2013”)
    • deleted on A and B
  • Created a new folder “coldwellbankercoaster” with a bunch of files on B (“SUNNYDESKTOP”)
    • copied to A and C.

image

I like it so far.  Dude, I would pay $100 (one time, multi-install) or $15 (per computer) for this software.  And for right now, its free! 

I am brewing a plan for total world domination.  My world, anyway. 

Rescuing Files from Windows Home Server V1

imageMy Home Server died a curious death.  I didn’t loose any drives.. but the operating system started hanging.   No more file copies, no more logging into the console, no more responsing to MSTSC.   I had run this for close to 8 years, I think?   Its hard to remember how far back.  Sometime around when I got married, that we bought it, with Christmas Money.  It has saved us 3 or 4 times with computer backups. 

First job: imageMake sure I had a current backup of everything that was on the server.  Could I get to the raw files?   I popped a drive out, attached it to my Desktop PC using a SATA-to-USB .. and..

Access Denied!   I had elevate my security level to gain access to the drive.  Not a problem; I found Beyond Compare, and ran it as Administrator.

The other part of the equation was where to save the files.   I had a 8 month old backup of the home server on an external hard disk that I brought back home from the office.  

This is what Beyond Compare found for me:

image

  • The files on the WHS drive are under x:\DE\shares.
  • DE stands for drive extender.
  • The DE folder is normally marked hidden. 
  • There’s actually another partition on the root of the home server, that has the folder structure for everything, and has soft-links to these files across drives.
  • The folder on the right, G:\2013-12-29_16…   is how Home server backs up files to an external hard drive.
  • Everything is done via NTFS (or maybe it was FAT_32 for some of it).  Very simple, straight forward stuff.  I love it!
  • I put Beyond Compare in “Left Newer and Orphans” mode – and it found stuff on the drive that was not yet in the (much larger) backup.
  • Beyond Compare did a stellar job of copying the files.

image

Next up:  BitTorrent Sync.

Mocking Time at 1000x speed

I had a complicated bit of logic that involved multiple threads, waiting for the right time, checking for time expiration, etc.   I wanted to unit test it, which meant mocking Time.  And not in a simple way, but in a complex, works-across-threads way.

This is what I ended up with:

image

That’s pretty simple.  The fun stuff happens in the Stub version:

image

Thus my logic component, which spawns multiple threads, each of which take a section of work to do, and each unit of work has “cannot start before” and “must not end after” – when it runs it against unit tests, it runs it at 100x or 1000x normal speed – but the simulation is accurate across all the threads. 

Flow, In the Zone

Blessed with a long drive, I caught up on some of my favorite podcasts.  I specifically listened to “Getting into the Zone with Mark Seeman” and it helped me light a bulb.  (53 minutes long, there should be a transcript in a few more weeks)

Lately, I’ve been struggling with “Do I want to stay technical, or should I start looking at changing focus.”  I know I would, and do, make a very good team technical lead.  I am very good at interpreting Client needs and designing architectures.    And I’m not as eager, sometimes, to dive into new stuff like some of my co-workers.    I wrote a long blog post about it, did not publish it, solicited feedback, and I’ve been sitting on it for a bit.

Then I started working on my current project, and I’ve been having a blast.  Why?  because I’m learning (SSIS and Dapper in this case).  And what I’m finding is, as I accomplish bits of code with these new things, I’m doing it “in the zone”.  Or in “Flow”.

How can I set myself up for more occurrences of Flow in my daily work? 

In addition to making the list of things that I could do during the day, I’m trying to state what I find rewarding in the task as well.   The rewards need to be swift – something that I could do in a Pomodoro.   Here’s a sample list:

What it is Negative Look Reward
Write Documentation Boring? Regenerate the TOC! (commit)
Write coupling code to copy web service data Plumbing. Commit!  Every! Table!  And Run It! Every! Table!
Write complicated logic thingy Bound to be buggy Extract it to a component, make it testable, and write Lots! of! Tests!
Status Report Yet again. Generate Awesome Graph!
5 Sections!  Each gets an Exclamation!
Write SSRS Report (I don’t know SSRS yet) Fear, I’ll get it wrong, etc Time Myself! How Fast Can I watch Pluralsight video? 2x? 1.5x?  Reassess after watching video.

Then, as I write down priorities, I actually write the priorities next to the rewards, so that when I’m choosing the next task, I’m not choosing the task.. I’m choosing the reward I’m trying to work towards.

Going to give it a shot. 

Day at the Office: SSIS, DiffGrams, XML, Dimension Loading using Dapper

Part of my project is to pull configuration data from a web service provided by my client and save it locally for our app to use.  The client wrote a very simple .ASMX, which returns a DataSet. 

imageMy first thought was to use SSIS to extract the data from the web service, and then do a dimension load into my local tables.   But I hit a problem:  When the SSIS control task saved the DataSet, it saved it as a diffgram

The result of this is that when using the SSIS Data Flow task for “pull me some data from an XML File”, while it does get the schema correct, it gets 0 rows, every time.

There is a workaround to use XSLT to transform this XML so that it works correctly:  Examples here and here;  but I discovered this a bit late in the game.  I ended up writing my own code to retrieve the data from the web service, and do a simple dimension load (insert, update, delete) against my local table.  The code that calls it looks like this:

image

And the generic part of the code is this:

image

This code is using Dapper to take care of the parameter-to-SQL bits.   It took about an hour or two to get the first one working just right.. and then another 1.5 hours for the remaining 6 dimension tables.

Its still not as fast as doing the work directly in SSIS, however, with SSIS I then have connection string management and other such fussy things to deal with, which could have taken just as long.

One nice fun thing I was able to do is take the saved XML from the web service (thank you SSIS), pull it local, embed it as a resource, and write a Stub version of the web service. 

image

image

image

Good Times.