Excel C# Interop

imageI made a mistake.

I had a project in mind – to pull information from basecamp classic (todo items, with metdata embedded), massage it in Excel, and then upload it back to basecamp.   

I have always used Excel (or some equivalent spreadsheet) for project tracking – I never know which columns I need to add, when I need to highlight things in different colors, what kinds of charts I need to pull.  So the decision to bring the information down to Excel was an easy one.

That cost me 1.5 days of pain.

In retrospect, it would have been much easier to use EF Code First against .\SQLEXPRESS and use SSMS to do the bulk of my editing.   But, not having the time to re-tool, I pushed forward, and I did get it to work well enough to help me next week as I start work with that project…

Anyway, this is what I learned along the way:

Workbook.Save();
Workbook.Close(false);
Application.Quit();

The recipe for closing down the instance of Excel that had been open.   Without getting asked if I want to save.

Marshal.ReleaseComObject(sheets);

You have to do this for pretty much ever object you dink with, else the COM Excel wrapper stays open in the background. Reference: http://stackoverflow.com/questions/158706/how-to-properly-clean-up-excel-interop-objects). I haven’t fixed this yet.

Sheet.Cells[1,1].Value
Sheet.UsedRange.Cells[1,1].Value

Cells start at index 1, not 0.   UsedRange is faster than Cells.

var connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fullName + ";Extended Properties=\"Excel 12.0 Xml;HDR=YES\"";

You can use ACE to query spreadsheet data, however, type-coercion is sketchy.  As in, if you have no data, that long field will come back as string.    I ended up ditching that and reading/writing my lists of stuff directly from the worksheet so I could have direct control over the types:

Worksheet sheet1 = b.Worksheets["Tasks"];
t2 = sheet1.ReadListFromSheet<MyTask>();
sheet1.UpdateListToSheet(t2, "TodoItemId");
        public static List<T> ReadListFromSheet<T>(this Worksheet sheet) where T:new()
        {
            var nameToCol = GetNameToColumnDictionary(sheet);
            var colToSetter = GetColToSetterDictionary<T>(nameToCol);

            var result = new List<T>();
            var rowCount = sheet.UsedRange.Rows.Count; 

            for (int row = 2; row<=rowCount; row++)
            {
                var item = new T();
                foreach (var e in colToSetter)
                {
                    var col = e.Key;
                    var setter = e.Value;
                    var o = sheet.Cells[row, col].Value;
                    if (o == null) continue;
                    Type source = o.GetType();
                    var target = setter.GetParameters()[0].ParameterType;
                    try
                    {
                        object o2 = null;
                        if (target == typeof(bool?))
                        {
                            o2 = Convert.ChangeType(o, typeof(bool));  // bypass the null stuff
                        } else if (target == typeof(int?))
                        {
                            o2 = Convert.ChangeType(o, typeof (int)); 
                        } else if (target == typeof(DateTime?))
                        {
                            o2 = Convert.ChangeType(o, typeof (DateTime));
                        }
                        else
                        {
                            // this generic conversion seems to catch most everything
                            o2 = Convert.ChangeType(o, target);
                        }
                        setter.Invoke(item, new object[] {o2});

                    }
                    catch (Exception ex)
                    {
                        Console.WriteLine("Need to handle from type: {0}, value:{1} to setter type: {2}", source.FullName, o.ToString(), target.FullName);
                    }
                }
                result.Add(item);
            }
            return result; 
        }

        public static void UpdateListToSheet<T>(this Worksheet sheet, IEnumerable<T> lists, string pkColumnName)
        {
            pkColumnName = pkColumnName.Trim().ToLower();

            // Grab the list of columns
            var nameToCol = GetNameToColumnDictionary(sheet);

            // get a mapping from list member to column number / vice versa
            var colToGetter = GetColToGetterDictionary<T>(nameToCol);


            // grab the mapping of pk to row number
            if (!nameToCol.ContainsKey(pkColumnName)) throw new Exception("Could not locate primary key " + pkColumnName + " on sheet " + sheet.Name);
            var pkColumnNumber = nameToCol[pkColumnName];
            var pkToRow = new Dictionary<string, int>();
            var emptyRows = new List<int>();
            for (var r = 2; r <= sheet.UsedRange.Rows.Count; r++)
            {
                var pkValue = sheet.Cells[r, nameToCol[pkColumnName]].Value;

                if (pkValue == null)
                {
                    emptyRows.Add(r);
                }

                var test = pkValue.ToString();
                if (test == String.Empty) continue;
                pkToRow[test] = r;
            }

            // get a direct line to the getter of the pk
            if (!colToGetter.ContainsKey(pkColumnNumber)) throw new Exception("Could not locate primary column in DTO????");
            MethodInfo pkGetter = colToGetter[pkColumnNumber];

            // write the list contents into matching column names in the spreadsheet, updating as we go
            foreach (var item in lists)
            {
                var pkVal = pkGetter.Invoke(item, null).ToString();
                if (String.IsNullOrEmpty(pkVal)) continue; // inbound item has no PK

                // figure out which row to write to
                int rowToUpdate = -1;
                if (pkToRow.ContainsKey(pkVal))
                {
                    // row already exists .. update it
                    rowToUpdate = pkToRow[pkVal];
                }
                else
                {
                    // row does not exist .. append it
                    if (emptyRows.Count > 0)
                    {
                        rowToUpdate = emptyRows[0];
                        emptyRows.RemoveAt(0);
                    }
                    else
                    {
                        rowToUpdate = sheet.UsedRange.Rows.Count + 1;
                    }
                }

                // write it out
                foreach (var e in colToGetter)
                {
                    int col = e.Key;
                    MethodInfo getter = e.Value;
                    var o = getter.Invoke(item, null);
                    // any translations from .Net to Excel would happen here - none so far
                    sheet.Cells[rowToUpdate, col] = o;
                }
            }
        }

        #region Private

        private static Dictionary<string, int> GetNameToColumnDictionary(Worksheet sheet)
        {
            var nameToCol = new Dictionary<string, int>();
            for (var c = 1; c <= sheet.UsedRange.Columns.Count; c++)
            {
                var columnName = sheet.Cells[1, c].Value;
                if (!(columnName is string)) continue;
                if (String.IsNullOrEmpty(columnName)) continue;
                columnName = columnName.Trim().ToLower();
                nameToCol[columnName] = c;
            }
            return nameToCol;
        }

        private static Dictionary<int, MethodInfo> GetColToGetterDictionary<T>(Dictionary<string, int> nameToCol)
        {
            var colToGetter = new Dictionary<int, MethodInfo>();
            var properties = typeof (T).GetProperties(BindingFlags.Public | BindingFlags.Instance);
            foreach (var p in properties)
            {
                if (!p.CanWrite || !p.CanRead) continue;
                MethodInfo mget = p.GetGetMethod(false);
                if (mget == null) continue;
                var lowerPropName = p.Name.ToLower();
                if (!nameToCol.ContainsKey(lowerPropName)) continue; // not present in target xls
                colToGetter[nameToCol[lowerPropName]] = mget;
            }
            return colToGetter;
        }

        private static Dictionary<int, MethodInfo> GetColToSetterDictionary<T>(Dictionary<string, int> nameToCol)
        {
            var colToGetter = new Dictionary<int, MethodInfo>();
            var properties = typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance);
            foreach (var p in properties)
            {
                if (!p.CanWrite || !p.CanRead) continue;
                MethodInfo mset = p.GetSetMethod(false);
                if (mset == null) continue;
                var lowerPropName = p.Name.ToLower();
                if (!nameToCol.ContainsKey(lowerPropName)) continue; // not present in target xls
                colToGetter[nameToCol[lowerPropName]] = mset;
            }
            return colToGetter;
        }
  • Not yet memory-leak proof
  • Can only use a long as an id
  • Does not handle nullable columns gracefully

I’ll post more later on what I was trying to do.   I did finally get it to work.   Sketchy as it all is.  I love that word Sketchy… thank you Rutledge Wood!

Now I have to go pick up my race bib for a 10 mile race tomorrow.