Diabetes Type II

I am a diabetic, type II.   I talk more about that on my livejournal.

I was reading a book, Wheat Belly by William Davis, MD.  It made a lot of sense, and fit directly into the knowledge that I already had — just gave me a new term, “AGE”‘s.   This confluence inspired me to put together a visio^H^H^H^H^H creately diagram of the concepts that I knew of so far, about my diabetes.

Here it is (click for full size):

This will probably get updated and reposted over time.  If you have any questions, ask, and I’ll tell you what I understand (but remember: I am NOT a doctor.  Just a geek.  With Diabetes Mellitus Type II.)

Duplicating sections of a PostgreSQL database using Powershell

The Problem

  • The customer has large postgreSQL database; it is too large to transfer over VPN.
  • I need to develop against a local copy of the database, where I can make schema modifications at will.

My Solution

  • Pull the schema
  • Pull the sequence information separately (it did not come over with the schema)
  • Pull full dumps for small tables (in order)
  • Pull subsets for large tables (in order)
  • Load everything locally
  • Do this in a script

Here is the code for the solution, with some commentary as to why certain things are the way that they are:

GetData.ps1

$PGDUMP = get-command pg_dump.exe 
$PSQL = get-command psql.exe

get-command verifies that it can find the executable in your current path, or dies if it cannot.
I try to do this for every executable I invoke in a powershell script.

$Env:PGCLIENTENCODING="SQL_ASCII"
$H="111.22.33.44"
$U="sgulati"
$P="5432"
$DB="deathstardb"

PGCLIENTENCODING was necessary because some of the rows in their database had UTF-8-like characters that confused the loader. I arrived at it by trial and error.

. .\tableconfig.ps1

Because I use the same configuration for getting data as for loading data, I pushed that into its own file.

tableconfig.ps1

$FULLTABLES = @( 
   "ds_employees.employees", 
   "ds_contacts.contact_types",
   "ds_contacts.companies",
   "ds_contacts.systems", 
   "ds_inbound.clients",
   "ds_inbound.feeds",
   "ds_inbound.pendingfiles"
); 
$PARTIALTABLES = @( 
   @(   "ds_inbound.processedfiles", 
        "select * from inbound.processedfiles where clientid='555' "
   ), 
   @(   "ds_inbound.missingfiles",
        "select * from inbound.missingfiles where clientid='555' "
    )
);

$FULLTABLES are tables I’m going to grab all data for.
$PARTIALTABLES are tables which I cannot grab all data for (they are too large), so I’m just going to grab the subset that I need

# PG_DUMP
# http://www.postgresql.org/docs/8.1/static/app-pgdump.html
# -s = schema only
# -a = data only
# -F = format.. p = plain, -c = custom
# -O = --no-owner
# -f = output file
# -c create
# -d --inserts
# -X --disable-triggers
# -E = encoding = SQL_ASCII

When there are confusing command line options called from a script, I put a comment in a script explaining
what many of the command line options are, along with a link to online documentation.
This helps with future maintenance of the script.

$exportfile = "${DB}.schema.sql"
if (! (test-path $exportfile)) { 
   "Schema: $exportfile"
   & $PGDUMP -h $H -p $P -U $U --create -F p -O -s -f $exportfile ${DB}
} else { 
   "skip schema: $exportfile"
}

I use a convention that if something has been pulled, do not pull it again.
This enables me to selectively refresh pieces by deleting the local cache of those files.

Note that The PGDUMP command creates a schema file, but does NOT pull current sequence values.

$exportfile = "${DB}.sequence.sql"
if (! (test-path $exportfile)) { 
    $sql = @"
select N.nspname || '.' || C.relname as sequence_name
from pg_class C
join pg_namespace N on C.relnamespace=N.oid
where relkind='S'
and N.nspname like 'ds_%'
"@
    $listOfSequences = ($sql | & $PSQL -h $H -p $P -U $U -d $DB -t)
    $sql = @()
    foreach ($sequence in $listofsequences) { 
       $trim = $sequence.trim(); 
       if ($trim) { 
           "Interrogating $sequence"
           $lastval = ( "select last_value from $trim" | & $PSQL -h $H -p $P -U $U -d $DB -t ) 
           $sql += "select setval('${trim}', $lastval);" 
       }
    }
    $sql | set-content $exportfile
} else { 
    "skip sequence: $exportfile"
}

This gets complicated:

  • I am running a query to get every sequence in the system.. then for each of those sequences, I’m getting the last value.
  • I am doing this by executing PSQL and capturing its output as text; I could have done it with Npgsql called directly from powershell, but i didn’t go down that route at the time this was written.
  • I am saving the information in the form of a SQL statement that sets the value correctly. This eliminates the hassle of understanding the data format.
  • I am relying on the customer’s convention of prefixing their schema names with “ds_” to filter out the system sequences. You may need a different approach.

Update: My customer read through this post, and pointed out something I had missed: There’s a view called

pg_statio_user_sequences

which provides a list of sequences. Still need to loop to get the current values… nevertheless, nice to know!

foreach ($fulltable in $FULLTABLES) { 
  $exportfile = "${DB}.${fulltable}.data.sql";
  if (! (test-path $exportfile)) { 
     "Full: $exportfile"
     & $PGDUMP -h $H -p $P -U $U --inserts --disable-triggers -F p -E SQL_ASCII -O -a -t $fulltable -f $exportfile ${DB}

	 # we need to patch the set searchpath in certain situations
	 if ($exportfile -eq "deathstardb.ds_inbound.feeds.data.sql") { 
		 $content = get-content $exportfile
		 for($i=0; $i -lt $content.length; $i++) { 
			 if ($content[$i] -eq "SET search_path = ds_inbound, pg_catalog;") { 
				$content[$i]="SET search_path = ds_inbound, ds_contacts, pg_catalog;"; 
			 }
		 }
		 $content | set-content $exportfile
	 }

  } else { 
     "Skip full: $exportfile"
  }
}

This executes PG_DUMP on the tables where we want full data, and dumps them into “rerunnable sql” files.
However, some of the triggers (that are pulled with the schema) were badly written; they made assumptions on the runtime searchpath (a postgres thing) and thus failed.
I fixed that by adding some search and replace code to convert bad sql into good sql for the specific instances that were dying.

foreach ($partialtabletuple in $PARTIALTABLES) { 
  $partialtable = $partialtabletuple[0];
  $query = $partialtabletuple[1]; 
  $exportfile = "${DB}.${partialtable}.partial.sql"; 
  if (! (test-path $exportfile)) { 
      "Partial: $exportfile"
	  & $PSQL -h $H -p $P -U $U -c "copy ( $query ) to STDOUT " ${DB} > $exportfile
  } else { 
	 "skip partial: $exportfile"
  }
}

This runs PSQL in “copy (query) to STDOUT” mode to capture the data from a query to a file. The result is a tab seperated file.

LoadData.ps1

Things get much simpler here:

$PSQL = get-command psql.exe
$Env:PGCLIENTENCODING="SQL_ASCII"
$H="localhost"
$U="postgres"
$P="5432"
$DB="deathstardb"

. .\tableconfig.ps1

# PSQL
# -c = run single command and exit

$exportfile = "${DB}.schema.sql"
& $PSQL -h $H -p $P -U $U -c "drop database if exists ${DB};"
& $PSQL -h $H -p $P -U $U -f "${DB}.schema.sql"
& $PSQL -h $H -p $P -U $U -d ${DB} -f "${DB}.sequence.sql"

I’m going with the model that I’m doing a full wipe – i don’t trust anything locally, I am far too creative a developer for that — hence I drop the database and start fresh.
I create the schema from scratch (there are a few errors, hasn’t bitten me yet)
and then I set all the sequence values.

foreach ($fulltable in $FULLTABLES) { 
  $exportfile = "${DB}.${fulltable}.data.sql"
  & $PSQL -h $H -p $P -U $U -d ${DB} -f $exportfile
}

Important: The data is loaded IN ORDER (as defined in $FULLTABLES), so as to satisfy FK dependencies.
To figure out dependencies, I used pgadmin‘s “dependencies” tab on an object, and drew it out on paper.
It seemed daunting at first, but upon persevering, it was only 6-7 tables deep. A job I had in 2006 had (30+ total, 7 deep?) for comparison.

foreach ($partialtabletuple in $PARTIALTABLES) { 
  $partialtable = $partialtabletuple[0];
  $query = $partialtabletuple[1]; 
  $exportfile = "${DB}.${partialtable}.partial.sql"; 
  get-content $exportfile | & $PSQL -h $H -p $P -U $U -d ${DB} -c "copy $partialtable FROM STDIN "
}

Source Control

I check everything into source control (subversion for me):

GetData.ps1
LoadData.ps1
Data\tableconfig.ps1
Data\deathstardb.schema.sql
Data\deathstardb.sequence.sql
Data\deathstardb.ds_employees.employees.data.sql
Data\deathstardb.ds_contacts.contact_types.data.sql
Data\deathstardb.ds_inbound.processedfiles.partial.sql
(etc)

Important bits here:

  • My client did not have a copy of their schema in source control. Now they do.
  • The naming convention makes it easy to know what each file is.
  • I’m keeping the data in a seperate folder from the scripts that make it happen.

Additional Scripting

There are some additional scripts that I wrote, which I am not delving into here:

  • the script that, when applied to a copy of the production database, creates what I am developing with.
    • Luckily, what I’m doing is all new stuff, so I can rerun this as much as I want, it drops a whole schema and creates with impunity
  • the script to apply the above (dev) changes to my local database
  • the script to apply the above (dev) changes to my development integration database

Whenever I’m working with a database, I go one of two routes:

  • I use the above “make a copy of prod” approach as my “start over”, and only have a script of forward-changes
  • I make my script do an “if exists” for everything before it adds anything, so it is rerunnable.

With either approach its very important that when a production rollout occurs, I start a new changes script, and grab a new copy of the schema.

There is a newer third route – which is to use some kind of software that states with authority, “this is what it should be”, and allows a comparison and update to be made against an existing data source. Visual Studio Database Solutions are one such example, ERStudio is another. Hopefully, it does its job right! Alas, this client does not have that luxury.

In conclusion

Getting my development environment repeatable is a key to reducing stress. I believe The Joel Test calls it #2: “Can you make a build in one step?”.

I used a ton of tricks to get it to work.. it felt like I was never going to get there.. but I did. If you do something 3-4 times, you might want to automate it.

May your journey be similarly successful.

Backing up and Restoring

I recently helped my wife set up her new work computer. I could not do everything; the IT guy had to come in and add it to the domain, and she installed various essentials like Minesweeper (j/k, i think it was Photoshop).
Being a good geek, I intend to have a good image of that computer now that its set up.

So, I practiced on my laptop tonight.

Step 1: Back up the machine.
Hook up external Hard Drive
Boot off Hiren’s Boot Disk
Basically following http://sir-sherwin.blogspot.com/2011/04/disk-imaging-using-acronis-true-image.html
(except that I used Seagate Disk Wizard Something Something with Acronis support)
2.5 hours later, I have several .tib files (partitioned into 4.7g chunks)
For reference, the laptop had 55G of used HD space.

Step 2: Play with Backup
Attached external Hard Drive to my big computer
Downloaded http://www.vmware.com/products/converter/
Started to convert the .tib file into a vmware image
There were a lot of options.. i ended up hydrating to a 80G virtual drive, and got to choose the partitioning scheme.

1.5 hours later, I have a vmware image i can run.

Step 3: See Laptop living in VM on big computer
Left = Original; Right = VM

There’s a few problems with drivers.. to do it perfectly, I would sysprep the machine…
It definitely validates the backup, though.

Just ’cause I could.
Yep, life is good.

What do I bill?

My first client in the consulting world at my current gig was a big company.   They welcomed me spending all kinds of time, including overtime, to get their product built faster.  They were also using somewhat older technologies – VS2008, VB.Net, WCF/SoA, a hand written DAL, no BOL – to get the work done, and as a result, I knew pretty much what I was going to do.  There was no “play” time needed.

My second client, is a small company.  Every hour counts.  The less hours, the better.  Its also entirely up to me how to build it – so, how new do I go?

  • I could use all the cutting edge stuff that I don’t know fluently, charge the client for me learning the hard way.
    • This seems unethical.
  • I could stick to older stuff that I do know.   Only sell the client the skills I’m awesome at, and spend time to bring some skills up to awesome.
    • It takes a lot of extra time to become awesome at things.
    • As I have a family i like.. this is impractical.
  • A mix of the above.
    • How?
My Employer’s Solution
They have set it up so that I am “salaried” at 36 hours.. and I have an extra 4 hours to do a self-directed project.   During this time, i can become awesome at stuff I don’t yet know!  And it doesn’t have to be related to any current or future project, just stuff I want to figure out.
(I have a huge list of things I want to play with and get working…)

My solution

I have partitioned my list into three sections:

  • BUCKET A: The stuff I know how to do fairly well
    • Architecture decisions
    • Project management
    • Console app, Parsing command line options
    • Setting up diagnostics in various places to make the utilities easier to use
    • Database design
    • Setting up local test data environment
    • Research into options available
  • BUCKET B: The stuff I don’t know how to do yet, that I will definitelyhave to learn.  I do charge for this, and try to get it working as fast as I can.
    • Fluent NHibernate + NHibernate
    • MVC3.  I am NOT going webforms, sorry.
    • Dealing with ENUM’s in PostgreSQL  and mapping those to enum’s in C#.
  • BUCKET C: The stuff I need to play with to figure out &  use, but if I don’t figure it out immediately, I can get by without it.
    • SpecFlow / BDD
    • fastest way to setup- and teardown  data in the database for functional/integration testing
    • Selenium

Then, its not that I stay away from Bucket C, its more that I focus most of my time on Bucket A… till I feel i’ve been productive..  then B, and then maybe C.

I also timebox bucket C.  For example, i researched Specflow today, figured out that yes, i want to use that.. and then I cut that off at 0.5 hours.  The rest of my specflow “play” time will be “on my own” – until I get it working enough that I can move it down into Bucket A.

This gives me a clear conscience – I’m not charging the customer for playing/learning  stuff that was not required – instead, i realize that with any new technology, there’s going to be some “settling” of it into my toolbelt, and that does take time.  I cannot “rush” that time – so I’ll do myself a favor, and not pressure myself into learning it quickly.

For that matter, I timebox bucket B as well.   For example, I could not get C# enums to save as PostgreSQL ENUM’s in about an hour of trying.  I had a workaround – save as TEXT.  I went for it – can revisit this at another time.

How this works at home

Hanging out with my wife is one of the joys of my life, and I do not shelve that easily.  Luckily, synergistically, My wife had dinner with friends today, so I was able to make tonight into “tech playtime”.   I combined playtime with  bringing my VMWare work image home on an external hard drive – and that worked beautifully as well.

My next chance for evening tech playtime, unfortunately, may not be till next week. But, if I can get ahead of my required hours at work, i might convert some work time into play time.    Getting in to work early means I can do playtime starting at 3 or 4 pm!

 

Why would I ever read a Technical Book?

When I was in college, I used to laugh at the “technical books” section of the bookstore in the mall. Well, actually I didn’t, because at the time, I would go there exculsively to drool over all the science fiction books.. $3.50 or so each.. that I could not afford, as I was living on ramen noodles and cans of peas, because that’s all I could afford. (link)

Then, when I became a working stiff paid professional, I would go to the technical books section and laugh, because.. I knew all that stuff. There was a lot less to know in the early 1990’s, and there was a lot of stuff “beneath me” (Dbase II, FoxPro, etc). (I was cool, I was porting apps from Clipper S87 to 6.0, and nothing came even close to the beauty of LPC)

In the late 1990’s, I would sneer, because I was a close-minded anti-microsoft pro-linux-perl guy, and I really did not want to know MFC. I did, however, buy and own the Perl Cookbook, which opened my mind to the amazing ways to hack things into place to get things done. I used that book a LOT. (The next year, i got sent to C# class because they had an extra spot, and I have changed my mind about Microsoft) actually, I would say that Microsoft changed, and no longer annoyed me. C# was almost as awesome as LPC. (link)

For a while, in 2006 – when I found myself facing unemployment (it lasted all of about 2 weeks), i found myself browsing technical books, lamenting: so much to learn, what shall I learn? I ended up gravitating towards unit testing and Asp.Net WebForms, which I learned almost entirely via google, not from a book. Thank you .Net Rocks and http://www.hanselminutes.com/ for the pointers! In this case, the dead-tree books did not do anything for me, and being unemployed, I felt I shouldn’t be spending $$$ if stuff was available on the internet for free.

I did buy some technical books in 2008 to read on vacation – wow, that was a rousing success. (Not). I hardly picked them up. A waste of $100+. (Patterns and Practises in C#, something else). They’re still too expensive, when all of that knowledge is available for relatively free on the internet.

So, i repeat the question:

Why would I ever buy a technical book?

My answer:

I bought one two nights ago. I wanted to know about how to use EF Code First – I couldn’t sleep – I bought the book on my iPad Kindle – and I read it, cover to cover, in about 45 minutes. And my lightbulb was born.

Here is what is different:

  • I have a very specific application, for which I might need the technology. This is not “reading for fun”, but rather, reading to get a specific job done.
  • I don’t know enough about the technology to know what to search for. (searching online only gave me introductory examples, nothing with real meat.)
  • I’m approaching it not as “a ton of money spent for a dead paperweight that I’ll never look at again”, but rather as a fairly inexpensive class briefing me in a specific subject which i can refer back to later. Most of these books cost me less than an hour’s work, after taxes. (I am a professional, and I need to know as much as possible, as quickly as possible, to give my client the kind of service that I want to give them.)
  • I have an e-reader. on my iPad, and pretty soon, on an e-ink device. I can archive with impunity, without killing bookshelves.

And thus, I’m sold. Here’s what I choose to read up on for my current client, to ensure I’m giving them the best that I can:

  • PostgresSQL (done) (pdf, free)
  • EF CodeFirst (done) ($10)
  • EF (general)
  • Asp.Net MVC 3 +/- Razor
  • Dependency Injection (Structuremap vs Unity)

Whee!