Tutoring? Growing? Developers

I volunteered my way into a meeting today with a pretty cool guy: Rider Rodriguez Jr of www.kentuckianaworks.org. We were discussing how to make Louisville a better city as far as Software Developers – there’s a shortage, actually nationwide, and the gap is likely to grown in the future. (I have no references, but I figure I’d trust him on that one).

It was a great meeting! We talked about what actually works, what doesn’t work. As people in the trenches who actually do the work – very often biggity-bigs who don’t do the work, have opinions of how people should learn and are looking for some way to leverage that for dollars. Not efficient. We just love doing it, and we know what it takes to do the work in our industry.

Then he presented to us the plan that he’s shooting for (applying for a grant, etc). I’m not going to try to detail it here, as I’ll doubtlessly get it wrong, but it does involve classroom availability, labs / peer sessions with people in the industry, building a portfolio in skills and abilities that companies actually want, and internships.

Much discussion later, and some white board drawing by yours truly, it seemed like we were all on the same page.

The Whiteboard. Someone noted that it looked like a person’s face. The face.. of the anonymous software developer.. dum dum dum!

I hope this works.. and yes, if anybody *wants* to learn *anything* that I know (including! how to make Instant Mashed Potatoes. I practiced that tonight. My wife thinks I did it wrong. She thinks that by adding liquid to powder, it would be lumpy. She might be right, it was rubbery), I’d love to help them learn it.. just to see the joy in their lives as they figure it out. I am the son of a teacher, after all. (10th from the top)

Database Change Scripts

At CodePaLousa Day #1, I attended a session, Tools and Adoption Patterns for Automating Database Deployment, about doing automated change in databases. That’s more than I need to do right now, however, it gave me a +1 on how I’m currently managing change, which is my default method if the customer doesn’t provide one.

What I do:

For every change, I start or append to a script file.
When a change gets to prod or “moves on” to branch, I start a different script file.
Every script file must be re-runnable.
Every script file must have a precondition check – something that is true of the database before it can run against it.
For SqlServer, if the precondition check fails, I turn noexec on to prevent the rest of the script from running.
For SqlServer, DDL statements are wrapped in exec() so that I don’t get compilation errors.

Here’s a sample, with names changed to protect the client:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- This is something from the most recent previous round of changes, which should already exist. 
PRINT 'Precondition Check'
if (not exists (select 1 from information_Schema.COLUMNS where TABLE_NAME='Quadrant' and COLUMN_NAME='ImperialCode')) 
begin
    PRINT '  Failed -- skipping rest of script'
    set noexec on
end else begin
    PRINT '  Passed'
end
GO 

-- In support of TA26 (revision 1683): 

PRINT 'Table NoRebelsAllowedRule'
if (not exists (select 1 from INFORMATION_SCHEMA.TABLES where TABLE_NAME='NoRebelsAllowedRule')) 
begin

    CREATE TABLE [dbo].[NoRebelsAllowedRule](
        [NoRebelsAllowedRuleID] [int] IDENTITY(1,1) NOT NULL,
        [CatalogID] [int] NOT NULL,
        [DeletedOn] [datetime] NULL,
        [DeletedByUserID] [int] NULL,
        [CreatedOn] [datetime] NOT NULL,
        [CreatedByUserID] [int] NOT NULL,
        [UpdatedOn] [datetime] NULL,
        [UpdatedByUserID] [int] NULL,
     CONSTRAINT [PK_NoRebelsAllowedRule] PRIMARY KEY CLUSTERED 
    (
        [NoRebelsAllowedRuleID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    PRINT '  Created'
end else begin
    PRINT '  Already Exists'
end
GO

PRINT 'Foreign Key FK_NoRebelsAllowedRule_Catalog'
if (not exists ( select 1 from information_schema.REFERENTIAL_CONSTRAINTS where constraint_name='FK_NoRebelsAllowedRule_Catalog' ) ) 
begin 
    ALTER TABLE [dbo].[NoRebelsAllowedRule]  WITH CHECK ADD  CONSTRAINT [FK_NoRebelsAllowedRule_Catalog] FOREIGN KEY([CatalogID])
    REFERENCES [dbo].[Catalog] ([CatalogID])
    PRINT '  Created'
end else begin
    PRINT '  Already Exists'
end
GO

PRINT '04/23/13 Replacing Catalog.CannotFireBlasters with BlasterBehaviorID'
if (not exists ( select 1 from information_schema.COLUMNS where table_name='Catalog' and column_name='BlasterBehaviorID' ) ) 
begin
    alter table [dbo].[Catalog] add BlasterBehavior INT NOT NULL DEFAULT '0'; 
    PRINT '  Added BlasterBehaviorID'
end else begin
    PRINT '  BlasterBehaviorID Already Exists'
end
if (exists (select 1 from information_Schema.COLUMNS where table_name='Catalog' and column_name='CannotFireBlasters' ) ) 
begin
    exec (N'update [dbo].[Catalog] set BlasterBehaviorID=0 where CannotFireBlasters=0');
    exec (N'update [dbo].[Catalog] set BlasterBehaviorID=3 where CannotFireBlasters=1');
    PRINT '  Migrated Data from CannotFireBlasters'
    alter table [dbo].[Catalog] drop column CannotFireBlasters
    PRINT '  Dropped old column CannotFireBlasters'
end else begin
    PRINT '  Old column CannotFireBlasters does not exist'
end
GO

set noexec off
GO

Since the above sample, I’ve started adding dates and work item numbers into the script as well, to make things a bit more traceable. I figured as long as I had a sequence going, I’d append to the same script, though in an environment with multiple developers, you may want to partition more.

Why I didn’t..

RedGate Sql Compare tools: I don’t own them.
Sql Server Project in Visual Studio: When you actually get around to using them against real environments, there are a ton of problems around “Server” projects verses “Schema” projects (or some such) which are worse in 2012 than they used to be.
EF Migrations: Haven’t spent the time to learn them, desire to be productive, sticking with what I already know and can do quickly.

Excel C# Interop

I made a mistake.

I had a project in mind – to pull information from basecamp classic (todo items, with metdata embedded), massage it in Excel, and then upload it back to basecamp.

I have always used Excel (or some equivalent spreadsheet) for project tracking – I never know which columns I need to add, when I need to highlight things in different colors, what kinds of charts I need to pull. So the decision to bring the information down to Excel was an easy one.

That cost me 1.5 days of pain.

In retrospect, it would have been much easier to use EF Code First against .\SQLEXPRESS and use SSMS to do the bulk of my editing. But, not having the time to re-tool, I pushed forward, and I did get it to work well enough to help me next week as I start work with that project…

Anyway, this is what I learned along the way:

Workbook.Save();
Workbook.Close(false);
Application.Quit();

The recipe for closing down the instance of Excel that had been open. Without getting asked if I want to save.

Marshal.ReleaseComObject(sheets);

You have to do this for pretty much ever object you dink with, else the COM Excel wrapper stays open in the background. Reference: http://stackoverflow.com/questions/158706/how-to-properly-clean-up-excel-interop-objects). I haven’t fixed this yet.

Sheet.Cells[1,1].Value
Sheet.UsedRange.Cells[1,1].Value

Cells start at index 1, not 0. UsedRange is faster than Cells.

var connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fullName + ";Extended Properties=\"Excel 12.0 Xml;HDR=YES\"";

You can use ACE to query spreadsheet data, however, type-coercion is sketchy. As in, if you have no data, that long field will come back as string. I ended up ditching that and reading/writing my lists of stuff directly from the worksheet so I could have direct control over the types:

Worksheet sheet1 = b.Worksheets["Tasks"];
t2 = sheet1.ReadListFromSheet<MyTask>();
sheet1.UpdateListToSheet(t2, "TodoItemId");

        public static List<T> ReadListFromSheet<T>(this Worksheet sheet) where T:new()
        {
            var nameToCol = GetNameToColumnDictionary(sheet);
            var colToSetter = GetColToSetterDictionary<T>(nameToCol);

            var result = new List<T>();
            var rowCount = sheet.UsedRange.Rows.Count; 

            for (int row = 2; row<=rowCount; row++)
            {
                var item = new T();
                foreach (var e in colToSetter)
                {
                    var col = e.Key;
                    var setter = e.Value;
                    var o = sheet.Cells[row, col].Value;
                    if (o == null) continue;
                    Type source = o.GetType();
                    var target = setter.GetParameters()[0].ParameterType;
                    try
                    {
                        object o2 = null;
                        if (target == typeof(bool?))
                        {
                            o2 = Convert.ChangeType(o, typeof(bool));  // bypass the null stuff
                        } else if (target == typeof(int?))
                        {
                            o2 = Convert.ChangeType(o, typeof (int)); 
                        } else if (target == typeof(DateTime?))
                        {
                            o2 = Convert.ChangeType(o, typeof (DateTime));
                        }
                        else
                        {
                            // this generic conversion seems to catch most everything
                            o2 = Convert.ChangeType(o, target);
                        }
                        setter.Invoke(item, new object[] {o2});

                    }
                    catch (Exception ex)
                    {
                        Console.WriteLine("Need to handle from type: {0}, value:{1} to setter type: {2}", source.FullName, o.ToString(), target.FullName);
                    }
                }
                result.Add(item);
            }
            return result; 
        }

        public static void UpdateListToSheet<T>(this Worksheet sheet, IEnumerable<T> lists, string pkColumnName)
        {
            pkColumnName = pkColumnName.Trim().ToLower();

            // Grab the list of columns
            var nameToCol = GetNameToColumnDictionary(sheet);

            // get a mapping from list member to column number / vice versa
            var colToGetter = GetColToGetterDictionary<T>(nameToCol);


            // grab the mapping of pk to row number
            if (!nameToCol.ContainsKey(pkColumnName)) throw new Exception("Could not locate primary key " + pkColumnName + " on sheet " + sheet.Name);
            var pkColumnNumber = nameToCol[pkColumnName];
            var pkToRow = new Dictionary<string, int>();
            var emptyRows = new List<int>();
            for (var r = 2; r <= sheet.UsedRange.Rows.Count; r++)
            {
                var pkValue = sheet.Cells[r, nameToCol[pkColumnName]].Value;

                if (pkValue == null)
                {
                    emptyRows.Add(r);
                }

                var test = pkValue.ToString();
                if (test == String.Empty) continue;
                pkToRow[test] = r;
            }

            // get a direct line to the getter of the pk
            if (!colToGetter.ContainsKey(pkColumnNumber)) throw new Exception("Could not locate primary column in DTO????");
            MethodInfo pkGetter = colToGetter[pkColumnNumber];

            // write the list contents into matching column names in the spreadsheet, updating as we go
            foreach (var item in lists)
            {
                var pkVal = pkGetter.Invoke(item, null).ToString();
                if (String.IsNullOrEmpty(pkVal)) continue; // inbound item has no PK

                // figure out which row to write to
                int rowToUpdate = -1;
                if (pkToRow.ContainsKey(pkVal))
                {
                    // row already exists .. update it
                    rowToUpdate = pkToRow[pkVal];
                }
                else
                {
                    // row does not exist .. append it
                    if (emptyRows.Count > 0)
                    {
                        rowToUpdate = emptyRows[0];
                        emptyRows.RemoveAt(0);
                    }
                    else
                    {
                        rowToUpdate = sheet.UsedRange.Rows.Count + 1;
                    }
                }

                // write it out
                foreach (var e in colToGetter)
                {
                    int col = e.Key;
                    MethodInfo getter = e.Value;
                    var o = getter.Invoke(item, null);
                    // any translations from .Net to Excel would happen here - none so far
                    sheet.Cells[rowToUpdate, col] = o;
                }
            }
        }

        #region Private

        private static Dictionary<string, int> GetNameToColumnDictionary(Worksheet sheet)
        {
            var nameToCol = new Dictionary<string, int>();
            for (var c = 1; c <= sheet.UsedRange.Columns.Count; c++)
            {
                var columnName = sheet.Cells[1, c].Value;
                if (!(columnName is string)) continue;
                if (String.IsNullOrEmpty(columnName)) continue;
                columnName = columnName.Trim().ToLower();
                nameToCol[columnName] = c;
            }
            return nameToCol;
        }

        private static Dictionary<int, MethodInfo> GetColToGetterDictionary<T>(Dictionary<string, int> nameToCol)
        {
            var colToGetter = new Dictionary<int, MethodInfo>();
            var properties = typeof (T).GetProperties(BindingFlags.Public | BindingFlags.Instance);
            foreach (var p in properties)
            {
                if (!p.CanWrite || !p.CanRead) continue;
                MethodInfo mget = p.GetGetMethod(false);
                if (mget == null) continue;
                var lowerPropName = p.Name.ToLower();
                if (!nameToCol.ContainsKey(lowerPropName)) continue; // not present in target xls
                colToGetter[nameToCol[lowerPropName]] = mget;
            }
            return colToGetter;
        }

        private static Dictionary<int, MethodInfo> GetColToSetterDictionary<T>(Dictionary<string, int> nameToCol)
        {
            var colToGetter = new Dictionary<int, MethodInfo>();
            var properties = typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance);
            foreach (var p in properties)
            {
                if (!p.CanWrite || !p.CanRead) continue;
                MethodInfo mset = p.GetSetMethod(false);
                if (mset == null) continue;
                var lowerPropName = p.Name.ToLower();
                if (!nameToCol.ContainsKey(lowerPropName)) continue; // not present in target xls
                colToGetter[nameToCol[lowerPropName]] = mset;
            }
            return colToGetter;
        }

Not yet memory-leak proof
Can only use a long as an id
Does not handle nullable columns gracefully

I’ll post more later on what I was trying to do. I did finally get it to work. Sketchy as it all is. I love that word Sketchy… thank you Rutledge Wood!

Now I have to go pick up my race bib for a 10 mile race tomorrow.

On Transitioning Assets between Environments

I’ve previously talked about unit tests VS integration tests and various database-inclusive testing strategies. I have recently been dealing with some pain on making sure everything needed for some code is moved between environments – a process fraught with human intervention and forgotten steps and mistakes and ambiguity. In an IM conversation with one really smart guy about this (Hi Josh!), an Idea was born.

Preliminary to the Idea

We’ve gotten pretty good at getting code between environments. There’s all kinds of things in source control for branching, and merging. There’s different flavors of branching, and different ways to package together code to send it places, my personal favorite is using a build server to do the actual deployment (thus documenting it).

However, every customer I’ve worked with, struggles with how to get [the right] database changes into an environment. None of them think they struggle with it, but they all do. (well, 3 of the last 4). A lot more than developers struggle with getting code moved around. The DBA’s are usually saying “all you have to do is X”, and the developers are saying “you want me to do WHAT?”, and they meet somewhere in the middle.

The standard idealist’s view of this world is: check your database into source control. The tricky bit becomes: using which solution? Visual Studio SQL Server Database Projects? Redgate SQL Source Control? E/R Studio Repository? RoundhousE? I have not yet met a client who does this; usually there is some really weird way that the description of what needs to change gets handed to the DBA, who then takes the changes the rest of the way.

To broaden the scope, the kinds of assets that I’ve seen companies struggle with:

Database schema changes
Database data changes
Configuration entries per environment
The right list of components being deployed per environment
All of the above when adding a new environment (essential for feature branching)

Target Audience

Folks working at small companies, where they usually have full access to the production server, might scoff at this; sorry guys you are not my current target audience.

The target audience is folks working at large enterprise companies, that host their own software — usually those that have Sarbanes-Oxley compliance that states: NO DEVELOPER CAN TOUCH THE PROD BOXES, EVER. [sidebar: maybe it doesn’t actually say that, but it’s the excuse given every time]. It’s a black box.

The target audience is also NOT people writing sellable software, where hopefully 500 customers will run an install 500 times – in those environments, there’s usually a chunk of resource per release to get the installers right. And a lot of VM’s to run installs on. Over and over.

Personal peeve: I ended up trying to communicate with other folks on what needs to happen, using a very limited vocabulary, often with edge cases that don’t fit their language. Sometimes, the chasm between our experiences was even larger, for example, in production they had a web farm behind a certain kind of load balancer with a firewall, and in lower environments where I had access, they didn’t (too expensive, not in the the department’s budget). But I certainly was not allowed to log in to those things to get a “feel” for them, and how to work with them; yet I was expected to to tell them exactly what they needed to get it running. Personal solution: Cookies. Shared.

The Birth of the Idea:

At the previously salaried job, I was in charge of adding some stuff to a website so that it could communicate to a web service. This web service was in charge of taking uploaded documents and processing them further, then depositing the results in a shared folder. I added a Test<ReleaseName>.aspx page to that website, where it talked to the TestWebService.asmx page that sat next to the normal web service, and they talked about stuff like:

What identity was the service running under?
Did the service have read/write/delete access to the drop folder?
could the service transfer a LAAAARGE file without dying over the network?

Within the website and the web service, I made sure to limit the request so that it would only “do work” if the sender was from the developer’s subnet, specifically excluding the subnet that the internet came in on (after they got through the firewall). Everything not recognized was immediately 404’ed.

The result was I could walk over, hang out with the build/deploy guy (without touching his keyboard and breaking compliance), and have him browse to the test website – and it told him everything that was going on, especially what was broken. We spent a LONG time trying to get the identity to run as the right person with the right access to the right folders, and if I hadn’t built the test pages, it could have gotten really frustrating. (Rather than BUILDGUY=>”no”=>DEVELOPER=>”try”=>PERMISSIONSGUYS=>”done”=>DEVELOPER=>”again?”=>BUILDGUY, it was [DEVELOPER+BUILDGUY]=>”try”=>PERMISSIONSGUYS=>”done”=>[DEVELOPER+BUILDGUY]=>”yay!”)

The Idea: Create Tests

Here’s a sample of the kinds of tests that I’m talking about:

Can I communicate to W1 web service correctly?
Does table T1 exist?
Does table T2 have an entry for E1,E2?
Does table T3 have a column C1?
Is there any data that matches this known pattern SQL1 which should no longer exist?
Does Table T4 have an index I1?
Does stored procedure SP1 exist?
Does stored procedure SP1 have xyz in its definition? (this could get fancier in your environment if you have stored procedure revision numbers; mostly trying to ensure that a stored procedure got updated)
Does any stored procedure have any references to XXX in their definition? (when dealing with the case where many stored procedures need to be modified because a table is going away)
Do at least N stored procedures have any references to XXX in their definition? (when dealing with multiple stored procedures being updated because something required got added to a table)
Can I read the configuration entry for E1?
Do I have {x/y/z} permission to network path N1?

The Proposal

To add a test/configuration channel to every component of what assets are required for that component to function
The structure could be like this (C# psuedo-code, written in notepad:)

public class Requirement { 
	public string Name { get; set; }            // to help us all talk about the same requirement
	public string ReleaseName { get; set; }     // in which release was this introduced
	public DateTime CreationDate { get; set; }  // when was the requirement created? 
	public string Description { get; set; }     // the "why" of the test, documentation of the asset needed or what role it plays
}

public class RequirementResult { 
    public Requirement Requirement { get; set; }  // which requirement are we talking about? 
	public bool? Success {get; set; }  			  // was it successful?  null = don't know
	public string Result { get; set; } 			  // the result of the test, what's broken, or what worked
	public IEnumerable ChildResults { get; set; }   // sub-component results
}

If a component depends on a child component, have it be able to query the child component’s test channel (example: web service calls to a web service in our control).
To ensure that this test/configuration channel is locked down by a simple, shared piece of code that determines if the test/configuration channel can run (on which the enterprise can sign off as a whole), and provides the structure of the test results.

Summary

I think it would be pretty useful.

It would get even more useful over time – with the system getting more documented with each additional release. This is the kind of detailed documentation many folks would drool over – much more useful than the documentation provided by writing unit tests, which in my experience, only 10% of developers think are useful.

I think it would help all the folks involved – developers, DBA’s, Network guys, Permissions guys, Deployment guys – talk the same language and have the same set of tools to determine if the work was “done”. Rather than being at odds with each other: “you need to communicate with me in MY language!”

The next project I get to start from scratch, I’m going to try this.

Catching up on Facebook Quickly–Try #1

I had removed the facebook app from my phone, because it was taking too much time. I dreamed of writing my own app that would let me catch up on facebook quickly, and a week at a time. This afternoon, I got some time to play.

Three hours Later

Research:

I had to go to developer.facebook.com, register, and create an application, to get an application ID. (well, maybe I didn’t, but I would eventually.)
I tried going down the path of Heroku, hosting an app, git, python, etc – but once I found the .Net client, I was done. (nuget: Facebook)

I ran into problems where heroku’s git client would not present the right ssh key, and I couldn’t find the ssh command line that it had installed to tell it the right key to use.

To bypass authentication, I went to https://developers.facebook.com/tools/explorer and generated an access token from there; it needed the read_stream priviledge. The token expires after an hour.

the url to query is “/me/home” (thank you stack overflow) to get the JSON that represents your newsfeed (most recent items first, looks like)

Then, I created an MVC app to do the querying. The controller side, to be replaced eventually with more fancier login stuff:

var accessToken = "AAACEdEose0cBABI5wFnNe209ddAmRFpWOB9T9O8x2sCaNlc91ZB1u6gqqrxHseMvBKhuDHtkS3KY6KAlIz6Xc8Ps24nvIWKRF... etc";
var client = new FacebookClient(accessToken);
dynamic homeraw = client.Get("me/home");

That’s great, I have a dynamic object, but I’d rather have intellisense. Enter json2csharp (thank you!!!!), creates me some classes, a little refactoring / renaming, and some hard coding:

foreach (var dataraw in homeraw.data)
{
  var homeItem = new HomeItem()
  {
    id = dataraw.id,
    from = new From()
    {
      id = dataraw.@from.id,
      name = dataraw.@from.name,
    },
    message = dataraw.message,
    caption = dataraw.caption,
    name = dataraw.name,
...

And sent everything off to my view, which spits it out into html (the Acronym stuff is not shown here, but I built “SG” out of “Sunjeev Gulati”):

@foreach (var item in Model.Data.data)
{
    <div class="infobubble">
        <div>
            <strong title="@item.from.name">@Model.AuthorAcroymn[item.from.id]</strong>:
            @(item.message ?? item.caption ?? item.name)
        </div>
        @if (!String.IsNullOrEmpty(item.picture))
        {
                <div>
                    <img class="pic" src="@item.picture" />
                </div>
        }
    </div>
}

Added a little styling:

.infobubble 
{
    font-size: 12px;
    border: 1px solid gray;
    float: left;
    display: inline-block;
    padding: 2px;
    margin: 2px;
    word-wrap: break-word;
}

And now for some fun. Reformatting things in jQuery to be more blocky:

    $(window).load(function () {
        $(".infobubble").each(function () {
            var width = $(this).width();
            var height = $(this).height();

            // determine if there's a picture in here somewhere
            var pic = $(this).find(".pic");
            if (pic.length > 0) {
                var picwidth = $(pic[0]).width();
                $(this).width(picwidth);
                width = $(this).width();
                height = $(this).height();
            } else {
                // no picture.  just text.  assume its already very wide
                // bring the width down till its closer to square
                for (var i = 0; i < 10; i++) {
                    if (width > height * 2.0) {
                        $(this).width(width / 2.0);
                        width = $(this).width();
                        height = $(this).height();
                    } else if (height > width * 2.0) {
                        $(this).height(height / 2.0);
                        width = $(this).width();
                        height = $(this).height();
                    } else {
                        break;
                    }
                }
            }
            $(this).data("width", width);
            $(this).data("height", height);
        });
        $("#container").masonry({ itemSelector: '.infobubble' });
    });

This does a few things:

If it has a picture, size to the picture. This usually works (though the “WAL-MART” one at the bottom of the screenshot is an interesting example, I’ll have to check for for overly long text)
Otherwise, try to size it till its kinda-square.
Then use jquery.Masonry to fill the space.

Where to go from here

Use some pre-defined sizes so that Masonry has an easier time filling a wall?
Use photo pictures, but really small, to identify who wrote a post?
Deal with odd-sized things, or find a better algorithm for blockifying things. Probably try standard widths like 50 100 200 and 400 and then move on.
Decide what to do with comments / likes, etc; support for other kinds of facebook items (only time will reveal them to me)
Going further back in time, rather than just a single call. Try to make it infinite scrolling? Or, call the service over and over till I have the data I want, re-sort it, and then display it? While being respectful of their service?
Actually go through the pain of logging the user in to facebook.
Host the app somewhere? I’m not sure if it violates any facebook laws, like maybe there’s one that says “you cannot write an app that replaces the newsfeed”?

If this app actually gets written, some of the features that would make it useful:

Enter a date range to view over (defaulting to the last time you were in)
Inline see comments (if a reasonable number, damn you Takei)
Filter down by person or type of post
Sort by person, size, photo.
Keep track of items seen or not-seen; Star items to look at later; mark all as read.
Automatic mute / filters.
Jump to item in facebook proper to go do something more with it.

Of course, that’s pie in the sky. If you want a crack at it, go for it. What I probably could do on short order:

Allow the user to enter the token in a textbox, so anybody could use it (without having to tackle the facebook login hurdle)
Host it somewhere for people to have fun with
Put the source on github

That would be “a good milestone”. Maybe next time.

Nevertheless: that was fun.

Automerging in TFS

There’s an ongoing thread in my head on “what’s different in the land of Feature Branches”, but it hasn’t fermented into something postable yet. However, there’s one low hanging fruit I can pluck: automatic merge between branches.

In the beginning, there was a branch…

First day hanging out with this team. The client already has a stellar team of developers; we were discussing how we could work with them on this “other” feature that they don’t have time to handle. Overly dramatized:

We: Pray tell, dear client, where shalt we code?
Client: Forsooth! Thy code may be smelly as in fish; and perhaps thy project shalt be backburnered; thus thou shalt code here: a subbranch from our development branch.
We: That shalt be wonderful, for we shall make this place our own, and be merry.

2 months go by. The feature takes form, completely changes, and takes form again, and our code [B] is not so smelly. However, we are also two months out of sync from their development branch; and we’re getting to the spot where we could think about releasing. The problem: They have had one release for feature [A], and then have another feature coming up [C] which is not ready to go.

The painful merge

We ended merging our code [A] back into their code [C] … and then followed their normal release path up to QA and out. Luckily, we were able to extract “just our feature” (plus a few extra styles) without moving their feature [C] (but that was luck, really).

That merge took a while:

3 days: Dev1 to Dev2, code + database changes + 67 conflicts. Dev2 now contains A+B+C. Merging sprocs outside of version control can be painful, thank you Beyond Compare for being wonderful.
1.5 days: Dev2 back to Dev1, mostly dealing with merging (the same) stored procedures and schema, 4 (easy) code conflicts. Dev1 now contains A+B+C.
Easy: Parts of Dev1 (the “most recent commits”) to QA. QA now has A+B and very little of C (a few styles crept in).
Again: We were lucky that there was almost no overlap between B and C.

Having no desire to redo that pain, we came up with a plan.

TeamCity Automerge script under TFS

We use TeamCity as our “easily installable build server”, so we set up a daily task at 5:30 in the morning to automatically merge anything new in our parent branch down to our branch:

$tf = get-item "c:\program files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\TF.EXE"
# http://youtrack.jetbrains.com/issue/TW-9050
& $tf workspaces /server:http://TFSSERVER/tfs/defaultcollection
# Its very important that the right branch (source of automerge) be listed below. 
& $tf merge /candidate $/PROJECTNAME/Dev/A . /r
& $tf merge            $/PROJECTNAME/Dev/A . /r /noprompt
$a=$LastExitCode
#if ($a -eq 0) { 
    & $tf checkin /comment:"automerge from A" /recursive /noprompt /override:"no workitem applies"
    $a=$LastExitCode
    # checkin with no changes yields a errorlevel 1
    if ($a -eq 1) { $a=0 } 
#}
# move this out to a seperate build step to be sure it runs
& $tf undo . /r
exit $a

We had a problem with getting tf to recognize workspaces, hence the extra tf workspaces call.
the tf /merge candidate lists the possible items to merge – used for populating the build log with information for humans.
the actual merge yields a 0 if there were no conflict errors. We save that to return later. If there’s no changes, that’s a 0.
if there were no conflicts, do a checkin. In this case, no changes is an error, so ignore that error.
finish up with a tf undo to “unlock” the files in tfs server.
return the exit code that would indicate a conflict if there was one.
We are running Teamcity under one of our accounts, thus there’s no login information in the calls to TFS. Most other VCS’s, we end up putting passwords in the script; its not the best, but there are few alternatives. Most companies that have a good build infrastructure usually have a build user for these kinds of things, which only the build admins know the password for, which once again would exclude us from using it.

Living with an Automerge

Most days it works fine. But some days it has conflicts. When it does have a conflict, it shows up as a failed build in TeamCity:

We started off with a 1-week rotation for dealing with conflicts, but that ended up being unfair to Tanner, he got 5 bad days in a row, so we switched to a round-robin “who has not done it recently” approach instead.

On the days that it did run green, opening the build log shows what got merged. We hardly ever look at this:

New Branching Strategy

Learning something, and having now earned the client’s trust, our next branch is rooted at the QA level, so that our development branch is a peer to theirs. This is a continuing experiment; there’s more to consider, hence the still-cooking post on Feature Branching.

Till Later!

Selenium

I started using Selenium with my current client .. on the side, mostly because logging into the website involved: “wait click text text click click text click wait click wait click”. Additionally, any time I compile, it wipes my logged in session, and thus repeat.

Last night at my son’s soccer game, I was chatting with another technical parent.. he was starting to use Selenium also! In his case, Selenium IDE and converting that into code for tests. Having played a bit with that side of things, and ending up with Selenium WebDriver + some ideas of how to do it nicely, I decided to compose this post to share my experiences so far.

Code on Github (learning github for windows);
VS2012, Nuget package restore enabled;
Total time taken to write code (from scratch) = Less than taken to write this post talking about the code.

Targeting NerdDinner

My client would have a cow (and a chicken, and maybe several other barnyard animals) if I put any of their code outside their network. And I’d get fired. So I recreated the pattern I am using against www.nerddinner.com. A certain dude whom I respect created it as a demo website. It had about the right mix of complexity – it was not just searching on google – but it wasn’t navigating gmail either. Thank you Scott.

First Attempt at Code.

See github branch here; Excerpt from here

        [Test]
        public void MainPage_CanSearchForDinners_ButNeverFindsAnyInKY()
        {
            using (IWebDriver driver = new FirefoxDriver())
            {
                driver.Navigate().GoToUrl("http://www.nerddinner.com");
                var input = driver.FindElement(By.Id("Location"));
                input.SendKeys("40056");
                var search = driver.FindElement(By.Id("search"));
                search.Click();
                var results = driver.FindElements(By.ClassName("dinnerItem"));
                // at this point, i don't know what to do, as there's never any search results.   
                Assert.AreEqual(0, results.Count,
                                "No dinners should be found.. omg, if this works, then its worth it to change the test");
            }
        }

This code is using NUnit (out of scope for this post)
This is how you navigate
This is how you find things
This is how you send text
This is how you click things
Every time you run, you’re starting of with no cookies, so you’ll have to log in every time.

Whee. Good start. Okay, now lets get serious[er].

Second Pass At Code

Per the recommendations of Jim Holmes (@aJimHolmes, http://frazzleddad.blogspot.com/), whom I met in person at http://www.codepalousa.com/ and whom I have also derived a great deal of respect, I know what smells funny:

The first pass is very brittle. If somebody changes an ID or a classname, you have a LOT of tests to go change. Solution: Page Pattern
What does one do when one does not have a second bullet, but there seems like there should be one?

What I accomplish with all this Jazz:

The overall branch is here; the rewritten test here:

        [Test]
        public void MainPage_CanSearchForDinners_ButNeverFindsAnyInKY()
        {
            MainPage.LocationToSearch.SendKeys("40056");
            MainPage.SearchButton.Click();
            var results = MainPage.PopularDinnerSearchResults;

            Assert.AreEqual(0, results.Count,
                            "No dinners should be found.. omg, if this works, then its worth it to change the test");

        }

Much simpler. The Page Helper Class is here, and partially looks like:

    public class MainPage : PageBase
    {
        public MainPage(IWebDriver driver)
            : base(driver)
        {
            Wait.Until(d => d.Title == "Nerd Dinner");
        }

        public static MainPage NavigateDirectly(IWebDriver driver)
        {
            driver.Navigate().GoToUrl("http://www.nerddinner.com");
            return new MainPage(driver);
        }

        public IWebElement LocationToSearch
        {
            get { return Driver.FindElement(By.Id("Location")); }
        }

        public IWebElement SearchButton
        {
            get { return Driver.FindElement(By.Id("search")); }
        }

The constructor takes and stores a reference to the Driver. The calling program is responsible for creating and disposing the driver (as its an expensive resource)
It uses a PageBase class which creates an additional WebDriverWait (the Wait variable)
The constructor waits until we know we’re on the right page. This allows us to click something somewhere else, and new up this object, and then wait till the page actually loads.
Because this is the root of the website, I include a NavigateDirectly() routine which says “I don’t care where you were, now go to this page”. I only do this on overview or login pages, the kind not derived from a click.
It exposes IWebElements to the callers (tests) so they don’t need to know where on the page various things are located.
In WebForms – I have to implement extension methiods which search IWebDriver and IWebElement for id’s Ending In Text, because ID=”btnSearch” ends up being “ctl00.BoogerBooger_04.HumDeDum_03.YabbaDabba.WickedBackbtnSearch”

Another excerpt (here):

public List UpcomingDinners
        {
            get
            {
                var upcomingDinnersUl = (from e in Driver.FindElements(By.ClassName("upcomingdinners"))
                                         where e.TagName == "ul"
                                         select e).FirstOrDefault();
                if (upcomingDinnersUl == null) throw new NotFoundException("could not find ul.upcomingdinners");
                return upcomingDinnersUl.FindElements(By.TagName("a")).ToList(); 
            }
        }

When doing complicated finds, I usually search By.ClassName or By.Id or whatever first, and end up with e.TagName second, because they do NOT provide an e.Id or e.ClassName routine. (You can do e.GetAttribute(“id”) but I’m not sure if that returns null or empty, and nulls can be a bummer).
new ByChained(x,y,z) does not mean an element which matches x,y and z, but instead an element x which contains an element y which contains an element z.

Third Pass – the Developer Helper

If I run the Console App which is the library that has all the page helpers, I get something like this:

The purpose of this app is to let me quickly get to sections of my website that I need to play around in.
The keywords yield a more detailed list; the more detailed list is used in navigation; using the old trick of:
```
delimited = “|“+delimited+”|“; if (delimited.contains(“|xxx|“) … 
```
which is what we did before arrays #GetOffMyLawn #WheresMyTeeth
Once the site is up, it stays there (until I choose somewhere else to go)

Additionally, in tests, I redid the creation/deletion of the driver as follows:

        [TestFixtureSetUp]
        public void FixtureSetup()
        {
            
        }

        [TestFixtureTearDown]
        public void FixtureTeardown()
        {
            AbandonDriver();
        }

        [SetUp]
        public void TestSetup()
        {
            if (Driver == null) Driver = new FirefoxDriver();
            MainPage = MainPage.NavigateDirectly(Driver);
        }

        [TearDown]
        public void TestTeardown()
        {
            
        }

        private void AbandonDriver()
        {
            if (Driver != null) Driver.Dispose();
            Driver = null;
        }

I am not creating a driver for every test; however if a test is going to mess up the relative state of a driver, it can be abandoned and a new one started.
This saves several seconds per test, which helps once you break the 5-6 test mark. Especially if what is saved is not as much “start the driver”, but more “start the driver, go to the web site, log in, and get to the page of interest”.

Conclusion

Once you’ve done it once or twice, its remarkably easy to write some pretty neat tests. This may not be the perfect way to do it, but its something that works for me; hope it works for you!

Mvc concepts

I’m reaching the end of my first professional (== I got paid) MVC application.

I learned a lot. This is an attempted visualization of most of the topics that I learned. You may need to click in to see the details.

In addition, as I found good sources of information on the ‘nets, I included links in my source code in comments. Here is a compilation of those nuggets of goodness. If your post is mentioned on this list — thank you.

Link	Goodness
http://stackoverflow.com/questions/4367723/get-enum-from-description-attribute	Embedding meta-data about things in enumerations
http://ayende.com/blog/1595/nhibernate-mapping-creating-sanity-checks	(NHibernate) Integration test to validate mapping completeness
http://www.codingoptimist.com/2010/11/nhibernate-hidden-gems-speed-up.html	(NHibernate) Caching configurations (gain: 1 second, not worth it, dropped)
http://stackoverflow.com/questions/325156/calling-generic-method-with-a-type-argument-known-only-at-execution-time	Having multiple types with shared logic was a wonderful spot for co-variance, but after about an hour, it was time to move on
http://www.nickriggs.com/posts/getting-the-id-and-name-attribute-generated-by-typed-html-helpers/	I had to drop down to a TextBox with an @class setting, and thus I couldn’t use TextBoxFor, so this came in very useful
http://www.codeproject.com/KB/cs/stringenum.aspx	Used by a coworker somewhere in the guts of an automatic model-to-.Where()-clause LINQ query extension
http://www.mikesdotnetting.com/Article/125/ASP.NET-MVC-Uploading-and-Downloading-Files	Made it oh-so-easy for me to save and upload intermediate files from this file processing system
http://stackoverflow.com/questions/1151987/can-i-set-an-unlimited-length-for-maxjsonlength-in-web-config/7207539#7207539	Before we got the Grid getting only 1 page at a time, we were getting everything – thus needing this
http://code.google.com/p/autofac/wiki/Mvc3Integration	Autofac was just amazing. It made sense to me! Its what I would have done if i was writing an IoC container…
http://slynetblog.blogspot.com/2012/03/using-aspnet-mvc-4-webapi-with.html	Using ASp.Net MVC4 with NHiberate and AutoFac .. pointing me in the right direction
http://forum.jquery.com/topic/span-text-new-text-not-working-in-ie	<span/> to <span></span> did the trick, then I could .text() it
http://stackoverflow.com/questions/10588587/jqgrid-does-not-render-correctly-in-chrome-chrome-frame	An annoying bug that sent me searching for at least 15 minutes
http://stackoverflow.com/questions/5248183/html-partial-vs-html-renderpartial-html-action-vs-html-renderaction	When the views became too large, I could refactor\|extract view
http://stackoverflow.com/questions/2498187/system-web-extensions-trouble-deploying-net-framework-4-website-on-iis7	The target server to install on did not have MVC3; so I had to include it in the XCOPY deploy

Software Estimation Review #1

One of my recent projects had this burndown:

At the start of the project, the “desired” hours were coming down at the speed of the “high” estimate. If I had used the first 5 data points to predict the end of the project, it would have been near T=7, at a cost of 6N or so, and the estimate was only for 5N. Not good.

Luckily, things started to sort themselves out as the project progressed. However, I wanted to understand what was going on.

At work, I track every slice of time at a project and “top level task” level. I also note down what exactly I’m doing (and usually its outcome – success, interrupt, break, etc – especially useful if there is fragmentation, you can see how interrupted you were – a metric to show pain).

I exported this detail into Excel, and created an item by item grid of time taken for each line item in the original estimate.

In the process I found several items that I worked on that didn’t have a home in the estimate, and I got “actual hours worked” for each of the line items. It was shocking. Here is an abridged version:

I saw tasks much larger than the original estimate.
I saw some tasks much smaller than the original estimate.
I missed several features.
I left out entire categories such as refactoring (as the project grows), documentation, styling and usability.

Possible Interpretations

I did well.

The project was done within the time allotted. This was a success. Perhaps the individual line items were not, but the overall project was – I was lucky that I overestimated several items. If I had tried to squeeze every line item, I would have been in trouble.

Could more Up Front Design yield a better estimate?

The Waterfall thought would be I didn’t do enough up-front design. However, I spent longer putting this estimate together than I should have – because in this job, putting together an estimate is not billable time! If I had spent more time on it, I would have had more line items – yielding a higher estimate – yielding a greater risk that the client would not approve the budget – and I would have still gotten it done on time – perhaps even less time.

I have done this before = takes less time

I had previously done a similar project. As a result, I estimated several items fairly low. Examples:

5 to 8 (actual: 23)
5 to 7 (actual: 12)
5 to 7 (actual: 14).

This is equivalent of an handyman stating: I’ve built a house before, so hanging drywall won’t take as long. Incorrect! It takes time to build software, even if you know exactly what you are doing.

“Sure that Technology will be easy to integrate”

My coworker (on loan to me for two weeks) spent a lot of time trying to get a grid to work correctly. Actually, the grid worked fine, the helper classes to interface it with MVC did not. Trying to get them to work took the majority of the hours on one of the pages.

In the end, he scrapped the helper classes he had downloaded – they worked only in specific circumstances – and rolled his own – and coded the rest of the page in record time.

I’m not sure which way to go on this. Perhaps having a bucket of “research” hours? Timebox the items that we research to save research budget in case its needed later? It seems every project I’ve been on, there’s been one or two things that took some time to get the approach figured out.

I left out parts of the project from the estimate

There were several items that I had not estimated up front that are a part of any project. See below for a running checklist.

I skipped parts of the estimate

Some functionality turned out to be trivial, or unnecessary. For example, we went straight from a grid of Endor to editing Endor without a stop at a detail page for Endor. Or, one of the setup pages – thought to be three seperate pages – collapsed into a single grid.

These “mistakes” are what saved the project.

Overall Analysis

I think my mistake was trying to specify everything in the estimate at a line by line level. As DPR207 at TechEd 2010 pointed out on slide 16, there are different levels of uncertainty as a project progresses. I was estimating at too granular a level too early – guaranteeing I would miss items, as well as specify some items that were unnecessary.

Doing it Differently

In this particular project, how else could I have estimated it?

Agile, Story Points, etc.

Using Story Points and approaching a project in an agile way (Scrum-like) are my favorite ways of handling project estimation and management. However: In my role as consultant, with clients pre-approving blocks of hours (based on a high end estimate), I don’t get to do that. So I’ll pass on that solution. I HAVE to have a number of hours up front, or I need to find a very understanding client.

Estimate at a higher level

Rather than estimating at a very detailed level, I could estimate in bigger chunks like this:

Level 1: Systems

“A Website”“A Batch App”

50,100,150,200,300,500 hours

Level 2: Functions / Features / Pages

Grid of FooEditing Foo

Selecting Bars for a Foo.

Reading in an xxx file.

Validating xxx information

10-20, 20-30, 30-50 hours

Fibonacci Numbers

I will definitely continue using Fibonacci numbers for estimation. It saves me brain power – Its either a 5 or an 8. I don’t have to argue with myself whether its a 7 or an 8 or a 9.

I use a slightly modified sequence: 0.5, 1,2,3,5,8,15 … somewhere in there, I jump into adding zeros: 5,10,20,30,50, etc.

Going Forward

Jim Benson (@ourfounder) gave a presentation at CodePaLousa: “We are Not Engineers“. (I tried to find a link to a powerpoint, and then I remembered, it was truly a powerpoint free presentation. I did find similar materials here) The gist of the presentation was: “Humans are horribly designed for doing estimation”. If I estimate based on gut feel, I’m going to be wrong. Thus, for my job, I either need to stop estimating, or start to build a toolkit.

This is the toolkit so far:

Keep track of all time and match it back up against the estimate at a line by line level.

It takes about 15-30 minutes to export from Harvest and match things up in Excel. Its a form of instant feedback.

Estimate in larger chunks.

Rather than counting the number of fields on a page (which is what I did previously), I need to find the average time to write a page, and then double it if it’s a big page. Try not to estimate in anything smaller than, say, 10 hours, for a chunk – a page, a file processing feature, etc.

Keep a checklist

My checklist at the moment:

Initial project setup
- Machine setup
- Project setup – layers (DAL, etc) – into SVN
- Stub version of project that can be deployed
Test data strategy (in support of integration testing)
- Test data creation and deletion
- Setting up automated integration tests to run automatically
Individual Items (Project specific)
- Larger chunks
- Include testing time for each item – writing unit or integration tests, and testing by hand.
Growing pains
- Refactoring is inevitable (and not specific to any particular item)
- Configuration and Deployment (as the project enters different environments)
- Merging between branches if branches are called for (any time there is a phased release, or a separation between UAT, QA and DEV)
- Necessary Documentation (after complicated things get done)
Project Management
- Dependent on each client. 10% of a 35 hour week = 3.5 hours of email, status updates, demo prep, and demoing – which is about right for a good client. Other clients, the number has been closer to 20%.
Research bucket
- For the unfortunate item(s) that will pop up that just don’t work
- Also can be used to take extra care to simplifying something so it stays more maintainable.
Usability, Look and Feel
- Cross browser testing
- “Making it Pretty” – fonts, colors, grids
- “Making it more functional” – renaming things, page organization, number of clicks to do something
I18N and L10N
- Dependent on client and project.

The Request

How do you estimate your projects? How close do you get? I truly want to know. Shoot me a comment or write it up and add a link.

Test Data Creators (Integration Tests)

Everybody seems to love Unit Tests

I agree, they are wonderful. I have lots of logic that is unit tested … and its easy to set up (especially with tools like moq)…

But its not what I rely on. I have found it to be too limited to give me the confidence I’m looking for as I write a system. I want as much tested as I can – including the data access layers – and how everything fits together – and that my dependency injectors are working correctly.

Another view: in my current project, I’m using nHibernate as the data provider. The general consensus on mocking nHibernate is: don’t do it. Instead, use an in-memory database (didn’t work – had to maintain different mapping files), or write an IRepository around it.

When I do that, what I find is most of the logic that needs testing is in the subtleties around my queries (LINQ and otherwise) – the rest is plumbing data from one representation to another. While unit testing that is valid, it does not cover the places where I find most of my failures. Stated in GWT syntax, my tests would be “GIVEN perfect data, WHEN logic is executed, THEN the app does the right thing” — “perfect data” being the elusive part.

I have tried providing a List<T>.AsQueryable() as a substitute data source in unit tests – and that works well, as long as my queries do not get complicated (involving NHibernate .Fetch and so on.) If the queries grew beyond my ability to mock them with .AsQueryable(); my “test” situation (LINQ against a list) started to differ significantly from the “real” situation (LINQ against a database) and I started to spend too much time getting the test just right, and no time on real code.

My Solution – Test Data Creators

My solution for the past 5 years over multiple projects has been “Integration Tests”, which engage the application from some layer (controller, presenter, etc) all the way down to the database.

“Integration”,”Unit”, and “Functional” tests — there seem to be a lot of meanings out there. For example, one boss’s idea of a “Unit” test was, whatever “Unit” a developer was working on, got tested. In that case, it happened to be the “Unit” of batch-importing data from a system using 5 command line executables. Thus, for this article only, I define:

Unit Test – A test called via the nUnit framework (or similar) that runs code in one target class, using mocks for everything else called from that class, and does not touch a database or filesystem

Integration Test – A test called via the nUnit frameowrk (or similar) that runs code in one target class, AND all of the components that it calls, including queries against a database or filesystem

Functional Test – Something I haven’t done yet that isn’t one of the above two

Turing Test – OutOfscopeException

Having built these several times for different projects, there are definite patterns that I have found that work well for me. This article is a summary of those patterns.

Pattern 1: Test Data Roots

For any set of data, there is a root record.
Sometimes, there are several.
In my current project, there is only one, and it is a “company”; in a previous project, it was a combination of “feed”, “company” and company.

The Pattern:

Decide on a naming convention – usually, “TEST_”+HostName+”_”+TestName
Verify that I’m connecting to a location where I can delete things with impunity — before I delete something horribly important (example: if connection.ConnectionString.Contains(“dev”))
If my calculated test root element exists, delete it, along with all its children.
Create the root and return it.
Use IDisposable so that it looks good in a using statement, and any sessions/transactions can get closed appropriately.

Why:

The HostNameallows me to run integration tests on a build server at the same time as a local machine, both pointed at a shared database.
I delete at the start to leave behind test data after the test is run. Then I can query it manually to see what happened. It also leaves behind excellent demo material for demoing functionality to client and doing ad-hoc manual testing.
The TestName allows me to differentiate between tests. Once I get up to 20-30 tests, I end up with a nice mix of data in the database, which is helpful when creating new systems – there is sample data to view.

Example:

using (var root = new ClientTestRoot(connection,"MyTest")) { 
    // root is created in here, and left behind. 
    // stuff that uses root is in here.  looks good. 
}

Pattern 2: Useful Contexts

Code Example:

using (var client = new ClientTestRoot(connection,"MyTest")) { 
    using (var personcontext = new PersonContext(connection, client)) { 
       // personcontext.Client
       // personcontext.Person
       // personcontext.Account
       // personcontext.PersonSettings
    }
}

I create a person context, which has several entities within it, with default versions of what I need.

I also sometimes provide a lambda along the lines of:

new PersonContext(connection, client, p=>{p.LastName="foo", p.Married=true})

to allow better customization of the underlying data.

I might chain these things together. For example, a Client test root gives a Person context gives a SimpleAccount context … or seperately, a MultipleAccount context.

Pattern 3: Method for Creating Test Data can be Different from What Application Uses

By historical example:

Project	Normal App Data Path	Test Data Generation Strategy
Project 1 (2006)	DAL generated by Codesmith	OracleConnection, OracleCommand (by hand)
Project 2 (2007)	DAL generated by Codesmith	Generic Ado.Net using metadata from SELECT statement + naming conventions to derive INSERT + UPDATE from DataTable’s
Project 3 (2008)	DAL generated by Codesmith	DAL generated by Codesmith — in this case, we had been using it for so long, we trusted it, so we used it in both places
Project 4 (2010)	Existing DAL + Business Objects	Entity Framework 1
Project 5 (2011)	WCF + SqlConnection + SqlCommand + Stored Procedures	No test data created! (see pattern 7 below)
Project 6 (2012)	NHibernate with fancy mappings (References, HasMany, cleaned up column names)	NHibernate with simple mappings – raw column names, no references, no HasMany, etc

The test data creator will only be used by tests — not by the application itself. It maintains its own network connection. However you do it, get it up and running as quickly as you can – grow it as needed. Refactor it later. It does NOT need to be clean – any problems will come to light as you write tests with it.

Pattern 4: Deleting Test Data is Tricky Fun

The easiest way everybody seems to agree on is: drop database and reload. I’ve had the blessings to be able to do this exactly once, its not the norm for me – usually I deal with shared development databases, or complicated scenarios where I don’t even have access to the development database schema.

Thus, I have to delete data one table at a time, in order.

I have used various strategies to get this done:

Writing SQL DELETE statements by hand — this is where I start.
Putting ON DELETE CASCADE in as many places as it makes sense. For example, probably don’t want to delete all Employees when deleting a Company (how often do we delete a company! Are you sure?) but could certainly delete all User Preferences when deleting a User. Use common sense.
Create a structure that represents how tables are related to other tables, and use that to generate the delete statements.

This is the hardest part of creating test data. It is the first place that breaks — somebody adds a new table, and now deleting fails because foreign keys are violated. (long term view: that’s a good thing!)

I got pretty good at writing statements like:

delete from c2
where c2.id in ( 
    select c2.id from c2
    join c1 on ...
    join root on ....
    where root.id = :id )

After writing 4-5 of them, you find the pattern.. the child of a C2 looks very similar to the delete query for C2, except with a little bit more added. All you need is some knowledge of where you delete first, and where you can go after that.

How Tables Relate

I no longer have access to the codebase, but as I remember, I wrote something like this:
var tables = new List(); 
var table1 = new TableDef("TABLE1","T1"); 
{ 
     tables.Add(table1); 
     var table2 = table1.SubTable("TABLE2","T2","T1.id=T2.parentid"); 
     { 
         tables.Add(table2); 
         // etc etc
     }
     // etc etc
}
tables.Reverse();   // so that child tables come before parent tables
I could then construct the DELETE statements using the TableDef’s above – the join strategy being the third parameter to the .SubTable() call.

Slow Deletes

I ran into a VERY slow delete once, on Oracle. The reason was, the optimizer had decided that it was faster to do a rowscan of 500,000 elements than it was to do this 7-table-deep delete. I ended up rewriting it:
select x.ROWID(), ...; foreach ... { delete  where rowid==... }
Moral(e): you will run into weird deletion problems. That’s okay, it goes with the territory.

Circular Dependencies

Given:

Clients have People

Feeds have Files For Multiple Clients

Files Load People

A loaded person has a link back to the File it came from

This led to a situation where if you tried to delete the client, the FK from Feed to Client prevented it. If you tried to delete the feed, the FK from People back to File prevented it.

The solution was to NULL out one of the dependencies while deleting the root, to break the circular dependency. In this case, when deleting a Feed, I nulled the link from person to any file under the feed to be deleted. I also had to do the deletes in order: Feed first, then Client.

Example:
Here’s some real code from my current project, with table names changed to protect my client:

var exists =
	(from c in session.Query() where c.name == companyNameToLookFor select c).
		FirstOrDefault();
if (exists != null)
{
	using (var tran = session.BeginTransaction())
	{
		// rule #1: only those things which are roots need delete cascade
		// rule #2: don't try to NH it, directly delete through session.Connection

		// ownercompany -> DELETE CASCADE -> sites
		// sites -> manual -> client
		// client -> RESTRICT -> feed
		// client -> RESTRICT -> pendingfiles
		// client -> RESTRICT -> queue
		// queue -> RESTRICT -> logicalfile
		// logicalfile -> CASCADE -> physicalfile
		// logicalfile -> CASCADE -> logicalrecord
		// logicalrecord -> CASCADE -> updaterecord

		var c = GetConnection(session);

		c.ExecuteNonQuery(@" 
			delete from queues.logicalfile 
			where queue_id in ( 
			   select Q.queue_id 
			   from queues.queue Q
			   join files.client CM ON Q.clientid = CM.clientid
			   join meta.sites LCO on CM.clientid = LCO.bldid
			   where LCO.companyid=:p0
			)
			", new NpgsqlParameter("p0", exists.id)); 

		c.ExecuteNonQuery(@" 
			delete from queues.queue 
			where clientid in ( 
				select bldid
				from meta.sites
				where companyid=:p0
			)
			", new NpgsqlParameter("p0",exists.id)); 

		c.ExecuteNonQuery(@"
			delete from files.pendingfiles 
			where of_clientnumber in (
				select bldid
				from meta.sites
				where companyid=:p0
			) ",
			new NpgsqlParameter(":p0", exists.id));
		c.ExecuteNonQuery(@"
			delete from files.feed 
			where fm_clientid in (
				select bldid
				from meta.sites
				where companyid=:p0
			) ", 
			new NpgsqlParameter(":p0",exists.id)); 
		c.ExecuteNonQuery(@"
			delete from files.client 
			where clientid in (
				select bldid
				from meta.sites
				where companyid=:p0
			) ",
			new NpgsqlParameter(":p0", exists.id)); 

		session.Delete(exists);
		tran.Commit();
	}
}

In this case, ownercompany is the root. And almost everything else (a lot more than what’s in the comments) CASCADE DELETE’s from the tables I delete above.

I did not write this all at once! This came about slowly, as I kept writing additional tests that worked against additional things. Start small!

Pattern 5: Writing Integration Tests Is Fun!

Using a library like this, writing integration tests becomes a joy. For example, a test that only accounts which are open are seen:

Given("user with two accounts, one open and one closed"); 
{
   var user = new UserContext(testClientRoot); 
   var account1 = new AccountContext(user,a=>{a.IsClosed=true, a.Name="Account1" }); 
   var account2 = new AccountContext(user,a=>{a.IsClosed=false,a.Name="Account2" }); 
}
When("We visit the page"); 
{ 
    var model = controller.Index(_dataService); 
}
Then("Only the active account is seen"); 
{
    Assert.AreEqual(1,model.Accounts.Count); 
    ... (etc)
    Detail("account found: {0}", model.Accounts[0]); 
}

The GWT stuff above is for a different post, its an experiment around a way of generating documentation as to what should be happening.

When I run this test, the controller is running against a real data service.. which could go as far as calling stored procedures or a service or whatever.
When this test passes, the green is a VERY STRONG green. There was a lot that had to go right for the test to succeed.

Pattern 6: Integration Tests Take Time To Iterate

Running unit tests – can easily run 300-500 in a few seconds. Developers run ALL tests fairly often. Integration tests, not so much.

Solution: Use a CI server, like TeamCity, and run two builds:

Continuous Integration Build – does the compile, and then runs unit tests on **/bin/*UnitTest.dll
Integration Test Buid – if previous build is successful, then triggers – compiles – and runs unit tests on **/bin/*Test.dll

Ie, the Integration Test build runs a superset of tests – Integration tests AND Unit Tests both.
This also relies on naming convention for test dll’s – *UnitTests.dll being more restrictive than *Tests.dll.
There’s another approach I have used, where Integration Tests are marked with a category and Explicit() – so that local runs don’t run them, but the integration server includes them by category name. However, over time, I have migrated to keeping them in separate assemblies – so that the unit tests project does not have any references to any database libraries, keeping it “pure”.

When working on code, I usually run one integration test at a time, taking 3-4 seconds to run. When I’m done with that code, I’ll run all tests around that component.. maybe 30 seconds? Then, I check it in, and 4-5 minutes later, I know everything is green or not, thanks to the CI server. (AND, it worked on at least two computers – mine, and the CI server).

Pattern 7: Cannot Create; Search Instead

This was my previous project. Their databases had a lot of replication going on – no way to run that locally – and user and client creation was locked down. There was no “test root creation”, it got too complicated, and I didn’t have the privileges to do so even if I wanted to tackle the complexity.

No fear! I could still do integration testing – like this:

// Find myself some test stuff
var xxx = from .... where ... .FirstOrDefault(); 
if (xxx == null) Assert.Ignore("Cannot run -- need blah blah blah in DB"); 
// proceed with test
// undo what you did, possibly with fancy transactions
// or if its a read-only operation, that's even better.

The Assert.Ignore() paints the test yellow – with a little phrase, stating what needs to happen, before the test can become active.

I could also do a test like this:

[Test] 
public void EveryKindOfDoritoIsHandled() { 
    var everyKindOfDorito = // query to get every combination
    foreach (var kindOfDorito in everyKindOfDorito) {
        var exampleDorito = ...... .FirstOrDefault(); 
        // verify that complicated code for this specific Dorito works
    }
}

Dorito’s being a replacement word for a business component that they had many different varieties of, with new ones being added all the time. As the other teams created new Doritos, if we didn’t have them covered (think select…case.. default: throw NotSupportedException()) our test would break, and we would know we had to add some code to our side of the fence. (to complete the picture: our code had to do with drawing pretty pictures of the “Dorito”. And yes, I was hungry when I wrote this paragraph the first time).

Interestingly, when we changed database environments (they routinely wiped out Dev integration after a release), all tests would go to Yellow/Ignore, then slowly start coming back as the variety of data got added to the system, as QA ran through its regression test suite.

Pattern 8: My Test Has been Green Forever.. Why did it Break Now?

Unit tests only break when code changes. Not so with Integration tests. They break when:

The database is down
Somebody updates the schema but not the tests
Somebody modifies a stored procedure
No apparent reason at all (hint: concurrency)
Intermittent bug in the database server (hint: open support case)
Somebody deleted an index (and the test hangs)These are good things. Given something like TeamCity, which can be scheduled to run whenever code is checked in and also every morning at 7am, I get a history of “when did it change” — because at some point it was working, then it wasn’t.
If I enable the integration tests to dump what they are doing to console – I can go back through Teamcity’s build logs and see what happened when it was last green, and what it looked like when it failed, and deduce what the change was.

The fun part is, if all the integration tests are running, the system is probably clear to demo. This reduces my stress significantly, come demo day.

Pattern 9: Testing File Systems

As I do a lot of batch processing work, I create temporary file systems as well. I utilize %TEMP% + “TEST” + testname, delete it thoroughly before recreating it, just like with databases.

In Conclusion

Perhaps I should rename this to “My Conclusion”. What I have found:

I love writing unit tests where it makes sense – a component, which has complicated circuitry, which can use a test around that circuitry.
I love even more writing integration tests over the entire system – One simple test like: “CompletelyProcessExampleFile1” tells me at a glance that everything that needs to be in place for the REAL WORLD Example File 1 to be processed, is working.
It takes time.
Its definitely worth it (to me).
Its infinitely more worth it if you do a second project against the same database.

May this information be useful to you.