Process with States

I implemented this again today, and realized it was a pattern that I fall back on regularly.

When dealing with a system with several states, and actions to be performed at each state, I usually break things up into a state diagram.  Inevitably, I end up with a “State” field, the the question becomes what to put in the state field.

Take an example of sending an invite to a user to come in and use an app.

What I do not do:

Alter Table Invitation

add state int not null default(1)

 

enum State { Start = 1, Email = 2, Consumed=3, Expired=4 }

What I do (approximately):

alter table Invitation add
	state varchar(25) null, 
	stateError bit(1) default(0) null, 
	stateTimestamp DateTime default(getdate()) not null

public static class InvitationState { 
   public const string SendEmail="SendEmail";
   public const string WaitInbound="WaitInbound"; 
   public const string InviteConsumed="InviteConsumed"; 
   public const string InviteExpired="InviteExpired";
}

Why?

  • It is my experience that for any flow, a future spec will come up with will add new states in the middle of that flow.  If you go with an int value, you end up with the flow being 1->9->5->6.   By going with a varchar, its much more obvious when doing a select from the db, and it allows for the addition of new states.
    • A previous employer loved using CHAR(4) and state names like “STRT”,”MAIL”,”CONS”,”EXPR” because Oracle stored the Char(4) like an int.   Very similar to Apple’s Resource Fork’s from back in the day.
    • I’m not so worried about the speed difference it makes.  I feel that the flexibility it gives makes up for any performance loss.
  • Either we have an action state, named with a “Verb Noun”, or we have and end state, named with “Noun Verb”.
  • The state names are based on what IS happening or SHOULD happen to the row, not what has already happened, except for the end states.
  • For any action state, if the attempt at performing that action fails, I set the stateError bit to 1.  This lets me know I need to do something to fix that; when its fixed, I can change the stateError back to 0.
    • In large batch systems, I write the console based action-taker with an override switch:  to work on a specific Id, even if the errorState=1; and to do it in a transaction but not commit the transaction.  
    • This lets me do test-things with rows that failed, while the main batch system processes all the other non-errored rows.   Once I get it to working, I tell it to go ahead and commit, and the system continues on its merry way.
  • The system only deals with the states that it knows.  This lets the technical OPS team (whoever that is) use the state field for its out-of band processes, stuff like setting the state to ‘Ken’, for example.   Because Ken is the man who knows what is up with those rows.
  • In really large critical systems, there’s an audit table and a trigger that auto-inserts any changes to the audit table. 

The result of this is, I get Christmas cards in the mail from former clients with phrases like “Still amazed by your code!!!”.   🙂

Hope this helps. 

Enjoying Git

My current client uses an (onsite?) enterprise github thingy. We do basically branch-per-feature, which means a lot of merging.  And surprisingly few conflicts.   I find myself constantly going back to this graph:

image

It’s not available on the free public github, AFAIK.    You can see where people pulled from Refactor back into their feature branch, and how much context switching there is (we have two folks working on the project).

A video that really explained git to me was this:

[youtube=http://www.youtube.com/watch?v=1ffBJ4sVUb4&w=448&h=252&hd=1]
Awesome Git Video!

It highlighted the directed graph structure, and the role that nodes and labels play.  After watching it, I understood Fetch, Pull, Push, Add, etc – it all made sense! 

However, I’ll confess – I’m using tortoiseGit, not the git command line.  

OpenCV: Or Not

I was hoping this would be a blog post about some wonderful stuff I was doing in OpenCV about detecting the location of soccer balls in video.. for the purpose of writing a robot that controls the Pan-Tilt-Zoom of the video camera to automatically follow the action in a soccer game so that I don’t have to (Lazy)

However, the program keeps hanging.

I started commenting out code to figure out what I was doing wrong.

Here’s the final version that hung:

using OpenCvSharp;
using OpenCvSharp.CPlusPlus;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace OpenCvTest
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                Console.WriteLine("Hey");
                //using (var cap = CvCapture.FromFile(@"00026.MTS"))
                //using (var cvWindow = new CvWindow("output"))
                //{
                //    var size = new Size(9, 9);
                //    while (CvWindow.WaitKey(10) < 0)
                //    {
                //        using (IplImage imgSrc = cap.QueryFrame()) 
                //        using (IplImage imgSmall = new IplImage(new Size(640, 480), BitDepth.U8, 1))
                //        using (IplImage imgGray = new IplImage(imgSmall.Size, BitDepth.U8, 1)) 
                //        {
                //            cvWindow.ShowImage(imgSrc);
                //            Cv.Resize(imgSrc, imgSmall);
                //            Cv.CvtColor(imgSmall, imgGray, ColorConversion.BgrToGray);
                //            Cv.Smooth(imgGray, imgGray, SmoothType.Gaussian, 9);
                //            using (CvMemStorage storage = new CvMemStorage())
                //            {
                //                CvSeq<CvCircleSegment> seq = imgGray.HoughCircles(storage, HoughCirclesMethod.Gradient, 1, 100, 150, 55, 0, 0);
                //                foreach (CvCircleSegment item in seq)
                //                {
                //                    imgGray.Circle(item.Center, (int)item.Radius, CvColor.Red, 3);
                //                }
                //                cvWindow.ShowImage(imgGray);
                //            } 
                //        }
                //    }
                //}
            }
            catch (Exception ex)
            {
                Console.Error.WriteLine(ex.Message);
            }
            finally
            {
                Console.WriteLine("Done");
            }

        }
    }
}
  • If you look closely, you’ll realize that all the lines are commented out.    And it still hung.  
  • 1 Reboot later:   Nope, still hanging.
  • Remove references to OpenCV:   Still Hangs.
  • Turns out if I run it without the debugger, its okay, but running it under the debugger, it hangs.

Well, La De Derp.   I don’t like it when things like Debugging don’t work.   Makes me .. trust things less.  Granted, this is Visual Studio Express.  But still!    Back to Console.WriteLine debugging for now…

Here’s some links to some code:

What I’ve learned:

  • In the future, I’d probably use another library – EmguCV?   Seems more mature?  
  • I might be able to use a <insert technical terms here> (other forms of detection – FAST something? BURPY something?  I remember not exactly) .. something which can use a single image to train from using 64 dimensional wavelets.  Woo.    Not today.
  • I have not researched PTZ Apis
  • I don’t have a solution for splitting the stream from capture to video and capture to ball detection
  • I’m not sure about the easing functions that would be involved, there’s quite a delay from detection (if I ever do) to movement
  • I’m not sure what to do with the ball kids playing with soccer balls at the sides of the field.

Day at the Office: Naming Anonymous Types

I ran into a problem yesterday.  I’m in the middle of a large routine.  It looks at a set of projected changes and determines, if the changes were applied, what would the errors be.  Most of these errors have to do with balances going out of bounds, so there are several balances rolled up at various levels.

Where It Started

At first, I created several Dictionary<??,decimal> of balances like this:

var balance1 = .... 
 .GroupBy(t=>new { T.ClientId, T.CommmodityId, T.BucketId })  
 .ToDictionary(
    g=>g.Key, 
    g=>g.Sum(x=>x.Quantity1)
    ); 

The Problem

The code grew, exceeding 300 lines of code in one routine.  Time to refactor.  However, when I refactored, it did not know what to name my anonymous type, and I got this:

void PerformFancyMath(Trade t, Dictionary<??,decimal> balances)

More detail and other options here:   http://stackoverflow.com/questions/6624811/how-to-pass-anonymous-types-as-parameters

Not A Problem!  I’ll stop using the anonymous type, and create a helper class for it.

My first attempt was something like this:

public class DooHickey { 
    public int ClientId { get; set; }
    public int CommodityId { get; set; }
    public int BucketId { get; set; }
}

However, this does not work, because DooHickey does not define Equals() or GetHashCode(), so the dictionary doesn’t know how to match keys together.    

I hate writing those boilerplate bits of code – I like F#’s idea where it defines these for you, same as C# anonymous types.

My Solution

Rather than writing out that code, this is what I did instead:

public class DooHickey : Tuple<int,int,int> { public DooHickey(int a, int b, int c):base(a,b,c) { } /* This bit is not really necessary, but could be done

public int ClientId { get { return base.Item1; } } public int CommodityId { get { return base.Item2; } } public int BucketId { get { return base.Item3; } }

*/  }

  • I didn’t realize I could have a Tuple with 3,4,5 or more constituents.
  • The Tuple handles GetHashCode() and Equal().

Now armed with my helper class, I was able to refactor my code and bring my methods down to under a screenful each. 

Success!

Primer on SSDT / Checking Schema into Source Control

I’ve been learning a lot of new things with this client – SSIS, SSRS, and now we’ve also implemented a workflow using SSDT.   In short:  This is a well supported way to check your database schema and seed data into source control.  So, looking back, what are the resources that I wish I had known going in?

Installation

imageNear as I can tell, SSDT is installed if/when you install SQL Server Express.   There’s a basic version also installed if you install VS2012 Premium and up, supposedly.

It can also be installed directly via http://msdn.microsoft.com/en-us/jj650015

Once installed, you’ll be able to do Schema Compares, Data Compares (useless to me?), and database projects (*.sqlproj).   As a lot of people don’t seem to have it installed, I usually put it in a different solution than the main project solution.

Capturing an Existing Database into a Database Project

  • Create database project; If you choose to go blank at first then..
  • Right click solution, compare schema, SWITCH DIRECTIONS, Update Target
  • Hint: Don’t save any user objects, not till you are very comfortable with it.

image

image

As of the current version, I no longer have to have one database project, and a different server project (as I did when I looked at it in 2010 or so).

Making Changes

Can either make them in the database and then suck the changes in, or can edit them directly in visual studio – either graphically or in SQL.

image

A build action will verify every reference – for example, sproc referencing columns, etc.   There’s a bit of a confusion around getting casing incorrect that you can turn off.

When you check the changes into source control, they are saved as the T-SQL version.  Very self-documenting, diffable, etc.

Applying Changes to Local Database Interactive

Use schema compare, select/deselect what you want to update, and hit Update.   

This does NOT run pre- and post- deployment scripts.

Applying Database Changes Automated / Without Visual Studio

Compile the project, get a .DACPAC file.   Then use SQLPACKAGE.EXE, located usually at C:\Program Files\Microsoft SQL Server\110\dac\bin\sqlpackage.exe.  Newlines added for readability.

“If I publish it, what will it do?”

  • /action:DeployReport
  • /sourceFile:Flintstone.Database.dacpac
  • /TargetServerName:BEDROCKCITY
  • /TargetDatabasename:FlintStone
  • /OutputPath:differences.xml
  • /p:DropObjectsNotInSource=false

“What is absolutely everthing that is different between the two?”

  • /action:DeployReport
  • /sourceFile:Flintstone.Database.dacpac
  • /TargetServerName:BEDROCKCITY
  • /TargetDatabasename:FlintStone
  • /OutputPath:differences.xml
  • /p:DropObjectsNotInSource=true

“Do it. (but don’t delete those users!)”

  • /action:Publish
  • /sourceFile:Flintstone.Database.dacpac
  • /TargetServerName:BEDROCKCITY
  • /TargetDatabasename:FlintStone
  • /p:DropObjectsNotInSource=false
  • /p:BlockOnPossibleDataLoss=false

It runs it with pretty colors in a console app.

With Sql Server Management Studio, caveats

In SSMS, under database, tasks, there is “Update Database-Tier Application”.  This does effectively the same thing, except you don’t get to choose the options above, so it usually does too much.  However, you can save off a change script and then walk the change script by hand.

Handing Data Changes

This guy wrote an excellent blog article about it:  http://schottsql.blogspot.com/2012/11/ssdt-pre-and-post-deploy-scripts.html.  I extended it a little bit:

  • Created a ScriptHistory table with a script name (PK) and when it was run, to avoid rerunning
  • This is for the one time refactors that move data from an old location to a new location (or going from inferred settings to explicit settings), but cannot be run again, but there is no way to detect if it’s a rerun or not.

image

image

Best Practice for Security / Users / etc

I have not had a chance to do this yet, but the general idea is to create roles for permissions, have SSDT deploy the roles, but the user membership is unique per server and is handled manually.  (We’re talking about situations like I am developing in COMPANY1’s domain, but the production server is in COMPANY2’s domain and Windows Auth is being used.)

In Conclusion

I watched my dba-idol Richard Svatos struggle with this in 2003.

I’ve tried to solve it in a variety of ways

I’m glad there is a usable, well supported solution now in the ecosystem which I am flourishing.

Headless Install Ubuntu on Mediasmart Server

I’m trying an experiment:

  • Connect up a 750G drive via my SATA to USB to my machine
  • Use VMWare Player
  • Create a VM
  • attach the VM to this 750G USB device
  • install Ubuntu Desktop from ISO (Desktop = 32bit, target is a 32bit server with 2G ram)
  • When done, transplant drive into headless server
  • Hope it boots up and gets to the network so I can ssh into it without a display or keyboard.

According to my research, there is a gotcha step, where I tell it that “every time I boot, rewrite what is eth0 and what is eth1, etc”.  See the docs here if needed:  https://www.google.com/search?q=70-persistent-net.rules  (I have not needed to do this so far)

image

Problem Number 1.  When I reboot:

image

Then it got worse.  It fails to recognize the USB-connected drive as a valid boot source, and I get “Operating System Not Found”.  

Ah well!  Lets try it anyway.  Disconnect the drive, stick it in the Mediasmart server in the first spot, and power on ..

  • It sure sounds like its booting.  Lots of spinning going on with that drive.
  • I can now ping it! 
    • image
  • I can confirm that the lights on the back of the machine coincide with ping packets!

Time to download putty and either telnet or ssh into my new diskarray computer!

Problem: neither telnet nor ssh work – I guess the servers are not up and running.

The next stop would be to put the drive on some hardware that can successfully boot it, install telnetd and ssh, and then move it back into the headless server.    I only have one machine that can do that at home, and that’s my main computer.. that I need for work here in a bit, so its not happening tonight. 

However, maybe I could look into setting up Ubuntu on a bootable USB flash drive?

Mocking Time at 1000x speed

I had a complicated bit of logic that involved multiple threads, waiting for the right time, checking for time expiration, etc.   I wanted to unit test it, which meant mocking Time.  And not in a simple way, but in a complex, works-across-threads way.

This is what I ended up with:

image

That’s pretty simple.  The fun stuff happens in the Stub version:

image

Thus my logic component, which spawns multiple threads, each of which take a section of work to do, and each unit of work has “cannot start before” and “must not end after” – when it runs it against unit tests, it runs it at 100x or 1000x normal speed – but the simulation is accurate across all the threads. 

Flow, In the Zone

Blessed with a long drive, I caught up on some of my favorite podcasts.  I specifically listened to “Getting into the Zone with Mark Seeman” and it helped me light a bulb.  (53 minutes long, there should be a transcript in a few more weeks)

Lately, I’ve been struggling with “Do I want to stay technical, or should I start looking at changing focus.”  I know I would, and do, make a very good team technical lead.  I am very good at interpreting Client needs and designing architectures.    And I’m not as eager, sometimes, to dive into new stuff like some of my co-workers.    I wrote a long blog post about it, did not publish it, solicited feedback, and I’ve been sitting on it for a bit.

Then I started working on my current project, and I’ve been having a blast.  Why?  because I’m learning (SSIS and Dapper in this case).  And what I’m finding is, as I accomplish bits of code with these new things, I’m doing it “in the zone”.  Or in “Flow”.

How can I set myself up for more occurrences of Flow in my daily work? 

In addition to making the list of things that I could do during the day, I’m trying to state what I find rewarding in the task as well.   The rewards need to be swift – something that I could do in a Pomodoro.   Here’s a sample list:

What it is Negative Look Reward
Write Documentation Boring? Regenerate the TOC! (commit)
Write coupling code to copy web service data Plumbing. Commit!  Every! Table!  And Run It! Every! Table!
Write complicated logic thingy Bound to be buggy Extract it to a component, make it testable, and write Lots! of! Tests!
Status Report Yet again. Generate Awesome Graph!
5 Sections!  Each gets an Exclamation!
Write SSRS Report (I don’t know SSRS yet) Fear, I’ll get it wrong, etc Time Myself! How Fast Can I watch Pluralsight video? 2x? 1.5x?  Reassess after watching video.

Then, as I write down priorities, I actually write the priorities next to the rewards, so that when I’m choosing the next task, I’m not choosing the task.. I’m choosing the reward I’m trying to work towards.

Going to give it a shot. 

Day at the Office: SSIS, DiffGrams, XML, Dimension Loading using Dapper

Part of my project is to pull configuration data from a web service provided by my client and save it locally for our app to use.  The client wrote a very simple .ASMX, which returns a DataSet. 

imageMy first thought was to use SSIS to extract the data from the web service, and then do a dimension load into my local tables.   But I hit a problem:  When the SSIS control task saved the DataSet, it saved it as a diffgram

The result of this is that when using the SSIS Data Flow task for “pull me some data from an XML File”, while it does get the schema correct, it gets 0 rows, every time.

There is a workaround to use XSLT to transform this XML so that it works correctly:  Examples here and here;  but I discovered this a bit late in the game.  I ended up writing my own code to retrieve the data from the web service, and do a simple dimension load (insert, update, delete) against my local table.  The code that calls it looks like this:

image

And the generic part of the code is this:

image

This code is using Dapper to take care of the parameter-to-SQL bits.   It took about an hour or two to get the first one working just right.. and then another 1.5 hours for the remaining 6 dimension tables.

Its still not as fast as doing the work directly in SSIS, however, with SSIS I then have connection string management and other such fussy things to deal with, which could have taken just as long.

One nice fun thing I was able to do is take the saved XML from the web service (thank you SSIS), pull it local, embed it as a resource, and write a Stub version of the web service. 

image

image

image

Good Times.

Having Fun with SSIS

My new project involves getting data from 500+ remote sites into a target database; in parallel, quickly.  Thanks to some proof of concept work done by my coworker Mike “The Devastator” Hurd (I made up the nickname), I got to learn SSIS, and I’m cursing that I had not picked it up earlier. 

What Is SSIS?

  • Imagine you wanted to write a generic program to get data from point A to point B, but modify it along the way.
  • You wanted pluggable components that could talk to each other.
  • You wanted anything that could be done in parallel to be done in parallel.  Ie, read from two tables at the same time if you have bandwidth.
  • You want it to work buffered rather than row by row.  But dynamically figure out buffer sizes so that they fit well to the page size on the computer (to minimize swapping)
  • You want a GUI that you can configure these components with.
  • You want every component to have a configurable “if things fail” state, to redirect failed rows to other components, etc.
  • You want it to be solid – no leaking memory, etc
  • You want to parameterize the stuff getting into the components
  • You want it to be runnable from a command line, with ability to override parameters and connection strings.
  • You want the whole configuration to be saveable as an XML file, or hosted in a SQL Server
  • You want to be able to schedule it to run from SQLAGent (SQL Server’s “Cron” thing)
  • You want to be able to run it directly from C# via managed code.

Then you would write SSIS.

What Does It Look Like?

image

image

image

image

Initial Gotchas

SSIS requires some level of licensing, so this does NOT work:

  • Install SQLEXPRESS
  • install Visual Studio 2012 Premium
  • Install SSIS add-on
  • Install SSDT (“database tooling for VS2012”)

It successfully loads, and you can debug packages within Visual Studio, but you cannot run them outside via dtexec and dtexecui (the command line thingies for SSIS).  Instead, you need to add:

  • Install SQL Server Developer Edition (which includes SSIS)  (or standard edition)
    • Select Integration Services & Data Tools
    • Do NOT have to install the engine.
    • Might have to install an updated SSDT to get VS2012 support

Then everything runs fine.

The other thing I quickly learned is it is very Schema-dependent.  The reason for this is it allocates row counts into buffer sizes based on the size of the rows – so a Varchar(MAX) field is very different from a Varchar(250) field – and the schema will fail to validate if the underlying data source it is reading from does not match perfectly.

The solution I had to apply was to do a select cast(x as varchar(250)) as x to extract the data to guarantee the schema coming in to SSIS.  

Conclusion

Very glad to have this tool in my arsenal.   Thanks Mike!