Learning New Stuff–Terraform, AWS, Lambda, DotNetCore

This last week has been a crash course in new stuff for me.  I’m helping with the scripts that manage the infrastructure around a project – freeing up the developer to work on user stories, I’m taking care (or trying to take care) of the deployment aspects of it.   In a way, its a big catch-up to other folks who have been charging ahead into newer technologies – so its not like I’m having to discover things on my own.  On the other hand, everything is already evolved to N+2, and I’m at N-1, so its a bit of a firehose.

Here goes though, stuff I’ve picked up this week:

  • Teamcity build calling a powershell script to do deployment stuff. 
    • New to me: I didn’t know PSPROJ was a thing – that I could step debug through powershell in Visual Studio now.  Come a long way since Powershell 1.0.
    • Dotnet lambda package, zipping, sending to S3…   Somebody else whose first name rhymes with “Miss” and last name rhymes with “Aye Lee” wrote this part for something else, I get to adapt it for the current project.
  • AWS API Gateway => AWS Lambda => C# NetCore1.0 => MVC  chain
    • Got to learn about the “Version Hell” that happens in NetCore1.0.   It will probably be much nicer by the time we get to 2.0 or better.. just the 1.0 to 1.1 is pretty rough at the moment.   Get the intersection of the bleeding edge of NetCore as it was 7 months ago with the bleeding edge of where AWS is taking their Amazon Linux.   We had to do a deviation and host some stuff via EB rather than Lambda. 
    • I’ll be playing more with this on Monday as I try to debug something into not giving me a 500 internal server error.
  • Terraform as a way of deploying AWS Resources
    • Modules, and Variables, and Data sources, oh my.
    • Debugging Terraform – I found the GET/POST requests.. the problem was a Content/Type for a resource in an S3 bucket.  Can’t get .body that way, so couldn’t get the hash value.
    • Partial apply’s because sometimes you don’t recognize a change and don’t want to mess up somebody else’s experimentation
    • I got to copy what Miss Aye Lee did, nice job Dude.
  • Rewrapping my brain around Build Configurations
    • Thanks to previous training, Build Config = Debug (PDB) vs Release, but also = XSLT Config Transforms to get configuration values per environment.
    • Now, Build Config = just Debug vs Release for “how debuggable do you want this”
    • There’s another avenue for “which settings do you want to use” which is completely different.
    • More playing with this on Monday.
  • AWS Security stuff
    • IAM User’s for local access from visual studio while developing
    • Roles for when running in Lambda, EC2, etc.  (Built by Terraform)
    • Policy documents describing what access available to what (built by Terraform), shared by the IAM and Role.
    • All the stuff that was actually built by Terraform using a Terraform runner credential
    • The Terraform Runner’s policy that allows it to create all the things
    • All running in another account that we cross-account assume roles into.
    • Somebody whose first name does not sound like XML and whose last name might have to do with Whiskey is a good teacher and dreamer.

The end result:

  • If starting from scratch – done by human.
    • cd env-shared;  terraform plan & apply to create shared resources, like S3 buckets, VPC’s, RDS’s, etc
    • Any further environment changes, also applied by human via script file.  No clicky the mouse.
  • New environment – like QA1 or QA2 or other – done by human
    • cd env-qa1 (or mkdir, if starting new)
    • copy and edit a file that says what the environment name is
    • terraform plan and apply to create all the things
      • DynamoDB tables
      • Queues
  • Every build to be deployed – automated, not done by human.
    • powershell to get stuff up to S3
    • powershell to call terraform to deploy
      • Lambda
      • API Gateway hangs out with this.

Pretty powerful stuff.     Glad I’m learning it.   It will feel better end of next week when I actually have something completely checked in that completely works.   

Car Stats–Speed vs Mileage Revisited

Previous posts in the series:

Revisiting the second post in this series, but this time:

  • All the data collected to date
  • Red for uphill, green for downhill, size of point for magnitude
  • Blue for mostly level

image

Analysis:

  • The ECU allows idling the engine when coasting, until I get to the last gear, at which time it changes its strategy – it always provides enough fuel to keep the engine purring at a somewhat higher number.  Probably because it doesn’t disengage the drivetrain.  But it does reduce the gas such that it’s a flat line across at 110 mpg or so.    (just enough oomph to prevent the engine from spinning down so fast that the car feels like its stuck in mud, probably.)
  • I get better gas mileage around 42 mph – closer to 40mpg.  Then it drops down to the 33mpg range as I get up to 55, but pretty much stays there through 75mph.
  • When accelerating, the engine opens up in such a way that I get a nice flat line at the bottom of the graph.

Code comments:

  • I added a column for altitude change – detected it within a file, I didn’t want to do it outside of the file boundary.
  • Sometimes, there’s a trailing comma in the column names.
  • I added better Axes to the graph.

Code:

$alldata = @(); 
$files = gci . -r -include "trackLog.csv"
foreach ($file in $files) { 
   $lines = get-content $file
   "Processing {0}: {1} lines" -f $file, $lines.count
   
   # to get around errors with header names not being valid object names
   $lines[0] = $lines[0] -ireplace '[^a-z,]','' 
   if (-not $lines[0].EndsWith(",")) { $lines[0] = $lines[0] + "," } 
   $lines[0] = $lines[0] + "AltChange"
   
   $data = ($lines | convertfrom-csv)
   
   for ($i=1; $i -lt $data.Count; $i++) { 
       $prevAlt = [double]$data[$i-1].AltitudeM
       $Alt = [double]$data[$i].AltitudeM
       if ($prevAlt -ne $null -and $Alt -ne $null) { 
           $data[$i].AltChange = $Alt - $prevAlt
       }
   }
   
   $alldata = $alldata + $data
}
"Total of {0} items" -f $alldata.count

$altmeasure = $alldata | measure-object AltChange -min -max -average

[void][Reflection.Assembly]::LoadWithPartialName("System.Windows.Forms.DataVisualization")
$chart = new-object System.Windows.Forms.DataVisualization.Charting.Chart
$chart.width = 800
$chart.Height = 600
$chart.Left = 40
$chart.top = 30
$chart.Name = "Foo"

$chartarea = new-object system.windows.forms.datavisualization.charting.chartarea
$chart.ChartAreas.Add($chartarea)

$legend = New-Object system.Windows.Forms.DataVisualization.Charting.Legend
$chart.Legends.Add($legend)

$series = $chart.Series.Add("Series1")
$series = $chart.Series["Series1"]
#FastPoint ignores color
$series.ChartType = [System.Windows.Forms.DataVisualization.Charting.SeriesChartType]::Point
$series.IsXValueIndexed = $false

$thresh = 0.05

foreach ($data in $alldata)
{
    if ($data.MilesPerGallonInstantMpg -eq $null) { continue } 
    if ($data.GpsSpeedkmh              -eq $null) { continue } 
    if ($data.AltChange                -eq $null) { continue } 

    $speed = [double]$data.GpsSpeedkmh * 0.621371          
    $mpg =   [double]$data.MilesPerGallonInstantMpg 
    $alt =   [double]$data.AltChange  
        
    if ($alt -lt -$thresh) { 
        # downhill green
        $color = [System.Drawing.Color]::FromARGB(100,0,255,0)
        $markersize = 1 - ($alt*2)
    } elseif ($alt -lt $thresh) { 
        $color = [System.Drawing.Color]::FromARGB(100,0,0,255)
        $markersize = 2
    } else  { 
        # uphill red
        $color = [System.Drawing.Color]::FromARGB(100,255,0,0)
        $markersize = 1+$alt*2
    }  
    
    if ($markersize -gt 5) { $markersize = 5 }
    
    $datapoint = New-Object System.Windows.Forms.DataVisualization.Charting.DataPoint($speed,$mpg)   
    $datapoint.Color = $color
    $datapoint.MarkerSize = $markersize
    
    $series.Points.Add($datapoint)
}

$chartarea.AxisX.Name = "Speed MPH"
$chartarea.AxisX.Interval = 5
$chartarea.AxisX.Minimum = 0
$chartarea.AxisX.IsStartedFromZero=$true

$chartarea.AxisY.Name = "MPG"
$chartarea.AxisY.Interval = 10
$chartArea.AxisY.Minimum = 0
$chartarea.AxisY.IsStartedFromZero=$true

$Form = New-Object Windows.Forms.Form 
$Form.Text = "PowerShell Chart" 
$Form.Width = 1100 
$Form.Height = 600 
$Form.controls.add($Chart) 
$Chart.Dock = "Fill" 
$Form.Add_Shown({$Form.Activate()}) 
$Form.ShowDialog()

Car Stats: Graphing with Powershell – Where Have I Been?

Previous posts in the series:

I could barely contain myself.  Thurday night, I had all kinds of data.. just calling my name.  

GPS Time, Device Time, Longitude, Latitude,GPS Speed(km/h), Horizontal Dilution of Precision, Altitude(m), Bearing, Gravity X(G), Gravity Y(G), Gravity Z(G),Miles Per Gallon(Instant)(mpg),GPS Altitude(m),Speed (GPS)(km/h),Run time since engine start(s),Speed (OBD)(km/h),Miles Per Gallon(Long Term Average)(mpg),Fuel flow rate/minute(cc/min),CO₂ in g/km (Average)(g/km),CO₂ in g/km (Instantaneous)(g/km)
Thu Jan 03 16:29:06 EST 2013,03-Jan-2013 16:29:11.133,-85.57728556666666,38.24568238333333,0.0,16.0,196.9,0.0,-0.015994536,0.9956599,0.0949334,0,196.9,0,17,0,31.66748047,19.45585442,259.47247314,-
Thu Jan 03 16:29:09 EST 2013,03-Jan-2013 16:29:14.004,-85.57729401666667,38.245684716666666,0.0,12.0,195.2,0.0,-0.015994536,0.9956599,0.0949334,0,195.2,0,20,0,31.66731453,45.80973816,259.47247314,-

Friday afternoon, once work things were completed, I started playing with it.  To start, I tried to read in the CSV file into Powershell.   I figured once I had it there, I could do *something* with it.

I faced some challenges:

  • The CSV column names are not “clean”, so I needed to sanitize them
  • Some files did not have certain pieces of data.
  • CSV import was a string, needed to be casted to a number before certain operations (“137” –gt 80.0 = false)
  • The units are fixed, part of the Torque app. (Actually, part of the OBDII standard)

After getting it read in, I looked around for a graphing library. Turns out I can use System.Windows.Forms.DataVisualization with Powershell. (thank you sir), which had some fun stuff:

  • FastPoint ignores colors
  • Had to turn off auto-scale on the Y-Axis
  • Made the charting controll Dock-Fill in the form

I ended up with this script:

$alldata = @(); 
$files = gci . -r -include "trackLog.csv"
foreach ($file in $files) { 
   $lines = get-content $file
   "Processing {0}: {1} lines" -f $file, $lines.count
   
   # to get around errors with header names not being valid object names
   $lines[0] = $lines[0] -ireplace '[^a-z,]','' 
   
   $data = ($lines | convertfrom-csv)
   $alldata = $alldata + $data
}
"Total of {0} items" -f $alldata.count

$speedmeasure = $alldata | measure-object GpsSpeedKmh -min -max
$speedspread = $speedmeasure.Maximum - $speedmeasure.Minimum
if ($speedspread -le 1.0) { $speedspread = 1.0 }

$mpgmeasure = $alldata | measure-object MilesPerGallonInstantmpg -min -max
$mpgspread = $mpgmeasure.Maximum - $mpgmeasure.Minimum
if ($mpgspread -le 1.0) { $mpgspread = 1.0 }

$ffmeasure = $alldata | where-object { $_.Fuelflowrateminuteccmin -ne "-" } | measure-object Fuelflowrateminuteccmin -min -max
$ffspread = $ffmeasure.Maximum - $ffmeasure.Minimum
if ($ffspread -le 1.0) { $ffspread = 1.0 }

[void][Reflection.Assembly]::LoadWithPartialName("System.Windows.Forms.DataVisualization")
$chart = new-object System.Windows.Forms.DataVisualization.Charting.Chart
$chart.width = 800
$chart.Height = 600
$chart.Left = 40
$chart.top = 30
$chart.Name = "Foo"

$chartarea = new-object system.windows.forms.datavisualization.charting.chartarea
$chart.ChartAreas.Add($chartarea)

$legend = New-Object system.Windows.Forms.DataVisualization.Charting.Legend
$chart.Legends.Add($legend)

$series = $chart.Series.Add("Series1")
$series = $chart.Series["Series1"]
#FastPoint ignores color
$series.ChartType = [System.Windows.Forms.DataVisualization.Charting.SeriesChartType]::Point
$series.IsXValueIndexed = $false

foreach ($data in $alldata)
{
    if ($data.MilesPerGallonInstantMpg -eq $null) { continue } 
    if ($data.Fuelflowrateminuteccmin -eq $null) { continue } 
    if ($data.Fuelflowrateminuteccmin -eq "-") { continue } 

    $speed = (([double]$data.GpsSpeedkmh              - $speedmeasure.Minimum) / $speedspread)  
    $mpg =   (([double]$data.MilesPerGallonInstantMpg -    $mpgspread.Minimum) /   $mpgspread)
    $ff    = (([double]$data.Fuelflowrateminuteccmin  -    $ffmeasure.Minimum) /    $ffspread)
    
    $higherspeed = $speed; 
    if ($higherspeed -gt 0.05) { $higherspeed = [Math]::Sqrt($speed) }
    $lowerspeed = $speed * $speed; 
    
    # MPG numbers seem to be clustered closer to 0 with a few annoying outlyers. spread them up a bit.
    #if ($mpg -gt 0.05) { $mpg = [Math]::Sqrt($mpg) }    

    # calculate color.   
    $blue = 250*$ff;
    
    # slower = more red
    # faster = more green
    # medium = yELLOW!
    $red = 250 - (250 * $lowerspeed)
    $green = 250 * $higherspeed
    
    $datapoint = New-Object System.Windows.Forms.DataVisualization.Charting.DataPoint($data.Longitude, $data.Latitude)   
    $color = [System.Drawing.Color]::FromARgb(125,$red, $green, $blue)
    
    $datapoint.Color = $color
    $series.Points.Add($datapoint)
    $datapoint.MarkerSize = ($mpg)*5 + 1
}

$chartarea.AxisY.IsStartedFromZero=$false

$Form = New-Object Windows.Forms.Form 
$Form.Text = "PowerShell Chart" 
$Form.Width = 1100 
$Form.Height = 600 
$Form.controls.add($Chart) 
$Chart.Dock = "Fill" 
$Form.Add_Shown({$Form.Activate()}) 
$Form.ShowDialog()

Which reads everything in, scrubs some data, figures some transforms, and yields me the following pretty picture:

image_thumb5

  • Green = faster, Red = slower.
  • Blue = Gas used (doesn’t show very well)
  • Fatter = Better MPG (I should reverse this, probably)
  • The green lines are the interstates (I71, I264, I64, and I265 shown)
  • Stop lights show up as little red dots.
  • I had to hand-scale the window till it looked right. Real co-ordinate systems some other day.

I would like to do this in 3-D, but I haven’t gotten my Processing chops quite figured out yet. Maybe next week!

Calories In Minus Calories Out


There was a conversation at work. Simplified:

VP (Visiting Person): “What Kind of Cookies do you have here?”
Me: “I don’t know, I haven’t checked. I’m not eating Wheat right now, and that shelf is all wheat.”
VP: “How’s that going? I’ve heard about that.”
Me: “Pretty good. I’ve lost a few pounds even without exercising.”
VP: “Yep, calories in and calories out.”

My brain took off on that conversation. Hence, this blog post.

Normal

The assumption is that people simply control the calories out. “Just work out harder”.

# Powershell
"Normal Situation: "
"Calories In : {0}" -f ($caloriesIn = 2000)
"Calories Out: {0}" -f ($caloriesOut = 2000)
"Net Calories: {0}" -f ($netCalories = $caloriesIn - $caloriesOut)
"Gain/Loose {0} lbs per week" -f ($netlbsperweek = $netCalories * 7/3500)

Normal Situation: 
Calories In : 2000
Calories Out: 2000
Net Calories: 0
Gain/Loose 0 lbs per week

True, but there’s a twist that I found out, due to my type two.

Conjecture for DM2

This is only my understanding of it. It is not proven or fact.

The hidden factor for me was insulin release, and and insulin resistance.

Lets say that for the calories I was taking in, my blood sugars bumped up to the 140+ range. Lets assume that I’m not working out actively at the moment. (when engaged in physical activity, some other form of transport happens, and sensitivity to insulin goes up).

My body is desperately trying to shove this energy into my cells, and is pumping out insulin (all that it has). The insulin has the effect of transmuting some of these calories to stored fat – almost immediately – before I have a chance to use it.

# Powershell
"Actual Situation: (not on drugs)"
"Calories In:  {0}" -f ($caloriesIn = 2000)
"Converted to fat due to high insulin levels: {0}" -f ($insulinToFat = $caloriesIn * 0.15 )   # total guess
"Fat gained per week: {0} lbs" -f ($fatperweek = $insulinToFat * 7 / 3500)
"Body must make do with {0}" -f ($bodyAvailable = $caloriesIn - $insulinToFat)
"Feeling as though only consuming {0} calories - ie, starving" -f $bodyAvailable

Actual Situation: 
Calories In:  2000
Converted to fat due to high insulin levels: 300
Fat gained per week: 0.6 lbs
Body must make do with 1700
Feeling as though only consuming 1700 calories - ie, starving

My evidence for myself:

There were 2-3 months that I went off my medication, but was watching and logging what I ate.
During that time, I ate 1800-2100 calories a day. Yet, I gained 5 pounds in about 4 weeks.

My deduction:
By storing those 5 pounds => that would mean 17500 calories over 28 days = 625 calories a day.
Which meant that I lived on around 1200-1400 calories a day.
And yes, I felt starved the whole time. (not really starved, but you know, the feeling? I have never really starved, except perhaps once)

So what the heck does my medication (Metformin) do?

Everything says “it limits the production of hepatic (liver) sugar”.
What does that have to do with anything?

My understanding (only my understanding) is:
The body MUST MUST MUST not get into a hypoglycemic situation – because the brain dies. Therefore, it monitors it very seriously.

As blood sugar gets too low, it tells the liver to go make some more. Its not an all or nothing – its a gradual release type thing. However, its tuned based on “relative” levels of blood sugar, not absolute levels.

Being diabetic, and having repeated elevated blood sugar levels “reset” what my body thinks normal is. So my body is churning out sugar even when I’m at a comfortable spot, like 110. It thinks 110 is low.

By jumping in and cutting that link (or reducing it, anyway), Metformin allows my “average” sugar levels to come back down to what they are now.. a fasting number of 80 or so.

And once I get down to 80.. and if I watch what I eat, such that any meal, 2 hours after, I’m back under 140 (these are numbers I’ve chosen for myself), my body leaves the second equation, and goes back to the first equation. Or maybe, the 0.15 goes down and becomes a 0.5. I don’t know exactly.

What I do know is: If I stay off Metformin, my weight goes up, and my fasting blood sugar levels go up. If I stay on it, and I eat wisely, they come back down to normal levels.

In Conclusion

All of the above is my explanation to myself.
Its probably wrong. The reality probably has something to do with aliens, monkeys, ninjas, and a turtle.
If you have a better explanation, grand unifying theory of blood sugar, please do post it and point me at it.

Duplicating sections of a PostgreSQL database using Powershell

The Problem

  • The customer has large postgreSQL database; it is too large to transfer over VPN.
  • I need to develop against a local copy of the database, where I can make schema modifications at will.

My Solution

  • Pull the schema
  • Pull the sequence information separately (it did not come over with the schema)
  • Pull full dumps for small tables (in order)
  • Pull subsets for large tables (in order)
  • Load everything locally
  • Do this in a script

Here is the code for the solution, with some commentary as to why certain things are the way that they are:

GetData.ps1

$PGDUMP = get-command pg_dump.exe 
$PSQL = get-command psql.exe

get-command verifies that it can find the executable in your current path, or dies if it cannot.
I try to do this for every executable I invoke in a powershell script.

$Env:PGCLIENTENCODING="SQL_ASCII"
$H="111.22.33.44"
$U="sgulati"
$P="5432"
$DB="deathstardb"

PGCLIENTENCODING was necessary because some of the rows in their database had UTF-8-like characters that confused the loader. I arrived at it by trial and error.

. .\tableconfig.ps1

Because I use the same configuration for getting data as for loading data, I pushed that into its own file.

tableconfig.ps1

$FULLTABLES = @( 
   "ds_employees.employees", 
   "ds_contacts.contact_types",
   "ds_contacts.companies",
   "ds_contacts.systems", 
   "ds_inbound.clients",
   "ds_inbound.feeds",
   "ds_inbound.pendingfiles"
); 
$PARTIALTABLES = @( 
   @(   "ds_inbound.processedfiles", 
        "select * from inbound.processedfiles where clientid='555' "
   ), 
   @(   "ds_inbound.missingfiles",
        "select * from inbound.missingfiles where clientid='555' "
    )
);

$FULLTABLES are tables I’m going to grab all data for.
$PARTIALTABLES are tables which I cannot grab all data for (they are too large), so I’m just going to grab the subset that I need

# PG_DUMP
# http://www.postgresql.org/docs/8.1/static/app-pgdump.html
# -s = schema only
# -a = data only
# -F = format.. p = plain, -c = custom
# -O = --no-owner
# -f = output file
# -c create
# -d --inserts
# -X --disable-triggers
# -E = encoding = SQL_ASCII

When there are confusing command line options called from a script, I put a comment in a script explaining
what many of the command line options are, along with a link to online documentation.
This helps with future maintenance of the script.

$exportfile = "${DB}.schema.sql"
if (! (test-path $exportfile)) { 
   "Schema: $exportfile"
   & $PGDUMP -h $H -p $P -U $U --create -F p -O -s -f $exportfile ${DB}
} else { 
   "skip schema: $exportfile"
}

I use a convention that if something has been pulled, do not pull it again.
This enables me to selectively refresh pieces by deleting the local cache of those files.

Note that The PGDUMP command creates a schema file, but does NOT pull current sequence values.

$exportfile = "${DB}.sequence.sql"
if (! (test-path $exportfile)) { 
    $sql = @"
select N.nspname || '.' || C.relname as sequence_name
from pg_class C
join pg_namespace N on C.relnamespace=N.oid
where relkind='S'
and N.nspname like 'ds_%'
"@
    $listOfSequences = ($sql | & $PSQL -h $H -p $P -U $U -d $DB -t)
    $sql = @()
    foreach ($sequence in $listofsequences) { 
       $trim = $sequence.trim(); 
       if ($trim) { 
           "Interrogating $sequence"
           $lastval = ( "select last_value from $trim" | & $PSQL -h $H -p $P -U $U -d $DB -t ) 
           $sql += "select setval('${trim}', $lastval);" 
       }
    }
    $sql | set-content $exportfile
} else { 
    "skip sequence: $exportfile"
}

This gets complicated:

  • I am running a query to get every sequence in the system.. then for each of those sequences, I’m getting the last value.
  • I am doing this by executing PSQL and capturing its output as text; I could have done it with Npgsql called directly from powershell, but i didn’t go down that route at the time this was written.
  • I am saving the information in the form of a SQL statement that sets the value correctly. This eliminates the hassle of understanding the data format.
  • I am relying on the customer’s convention of prefixing their schema names with “ds_” to filter out the system sequences. You may need a different approach.

Update: My customer read through this post, and pointed out something I had missed: There’s a view called

pg_statio_user_sequences

which provides a list of sequences. Still need to loop to get the current values… nevertheless, nice to know!

foreach ($fulltable in $FULLTABLES) { 
  $exportfile = "${DB}.${fulltable}.data.sql";
  if (! (test-path $exportfile)) { 
     "Full: $exportfile"
     & $PGDUMP -h $H -p $P -U $U --inserts --disable-triggers -F p -E SQL_ASCII -O -a -t $fulltable -f $exportfile ${DB}

	 # we need to patch the set searchpath in certain situations
	 if ($exportfile -eq "deathstardb.ds_inbound.feeds.data.sql") { 
		 $content = get-content $exportfile
		 for($i=0; $i -lt $content.length; $i++) { 
			 if ($content[$i] -eq "SET search_path = ds_inbound, pg_catalog;") { 
				$content[$i]="SET search_path = ds_inbound, ds_contacts, pg_catalog;"; 
			 }
		 }
		 $content | set-content $exportfile
	 }

  } else { 
     "Skip full: $exportfile"
  }
}

This executes PG_DUMP on the tables where we want full data, and dumps them into “rerunnable sql” files.
However, some of the triggers (that are pulled with the schema) were badly written; they made assumptions on the runtime searchpath (a postgres thing) and thus failed.
I fixed that by adding some search and replace code to convert bad sql into good sql for the specific instances that were dying.

foreach ($partialtabletuple in $PARTIALTABLES) { 
  $partialtable = $partialtabletuple[0];
  $query = $partialtabletuple[1]; 
  $exportfile = "${DB}.${partialtable}.partial.sql"; 
  if (! (test-path $exportfile)) { 
      "Partial: $exportfile"
	  & $PSQL -h $H -p $P -U $U -c "copy ( $query ) to STDOUT " ${DB} > $exportfile
  } else { 
	 "skip partial: $exportfile"
  }
}

This runs PSQL in “copy (query) to STDOUT” mode to capture the data from a query to a file. The result is a tab seperated file.

LoadData.ps1

Things get much simpler here:

$PSQL = get-command psql.exe
$Env:PGCLIENTENCODING="SQL_ASCII"
$H="localhost"
$U="postgres"
$P="5432"
$DB="deathstardb"

. .\tableconfig.ps1

# PSQL
# -c = run single command and exit

$exportfile = "${DB}.schema.sql"
& $PSQL -h $H -p $P -U $U -c "drop database if exists ${DB};"
& $PSQL -h $H -p $P -U $U -f "${DB}.schema.sql"
& $PSQL -h $H -p $P -U $U -d ${DB} -f "${DB}.sequence.sql"

I’m going with the model that I’m doing a full wipe – i don’t trust anything locally, I am far too creative a developer for that — hence I drop the database and start fresh.
I create the schema from scratch (there are a few errors, hasn’t bitten me yet)
and then I set all the sequence values.

foreach ($fulltable in $FULLTABLES) { 
  $exportfile = "${DB}.${fulltable}.data.sql"
  & $PSQL -h $H -p $P -U $U -d ${DB} -f $exportfile
}

Important: The data is loaded IN ORDER (as defined in $FULLTABLES), so as to satisfy FK dependencies.
To figure out dependencies, I used pgadmin‘s “dependencies” tab on an object, and drew it out on paper.
It seemed daunting at first, but upon persevering, it was only 6-7 tables deep. A job I had in 2006 had (30+ total, 7 deep?) for comparison.

foreach ($partialtabletuple in $PARTIALTABLES) { 
  $partialtable = $partialtabletuple[0];
  $query = $partialtabletuple[1]; 
  $exportfile = "${DB}.${partialtable}.partial.sql"; 
  get-content $exportfile | & $PSQL -h $H -p $P -U $U -d ${DB} -c "copy $partialtable FROM STDIN "
}

Source Control

I check everything into source control (subversion for me):

GetData.ps1
LoadData.ps1
Data\tableconfig.ps1
Data\deathstardb.schema.sql
Data\deathstardb.sequence.sql
Data\deathstardb.ds_employees.employees.data.sql
Data\deathstardb.ds_contacts.contact_types.data.sql
Data\deathstardb.ds_inbound.processedfiles.partial.sql
(etc)

Important bits here:

  • My client did not have a copy of their schema in source control. Now they do.
  • The naming convention makes it easy to know what each file is.
  • I’m keeping the data in a seperate folder from the scripts that make it happen.

Additional Scripting

There are some additional scripts that I wrote, which I am not delving into here:

  • the script that, when applied to a copy of the production database, creates what I am developing with.
    • Luckily, what I’m doing is all new stuff, so I can rerun this as much as I want, it drops a whole schema and creates with impunity
  • the script to apply the above (dev) changes to my local database
  • the script to apply the above (dev) changes to my development integration database

Whenever I’m working with a database, I go one of two routes:

  • I use the above “make a copy of prod” approach as my “start over”, and only have a script of forward-changes
  • I make my script do an “if exists” for everything before it adds anything, so it is rerunnable.

With either approach its very important that when a production rollout occurs, I start a new changes script, and grab a new copy of the schema.

There is a newer third route – which is to use some kind of software that states with authority, “this is what it should be”, and allows a comparison and update to be made against an existing data source. Visual Studio Database Solutions are one such example, ERStudio is another. Hopefully, it does its job right! Alas, this client does not have that luxury.

In conclusion

Getting my development environment repeatable is a key to reducing stress. I believe The Joel Test calls it #2: “Can you make a build in one step?”.

I used a ton of tricks to get it to work.. it felt like I was never going to get there.. but I did. If you do something 3-4 times, you might want to automate it.

May your journey be similarly successful.