Lightweight DataTable Serialization

We all know untyped data structures like DataTable and DataSet should not be passed around but sometimes - just sometimes - you got to do it because it makes sense and because it’s the most cost effective way to meet your goals. However passing things like DataTable over WCF can kill performance because of huge serialization overhead in both space and time.

So if you really had to go ahead with this crazy idea of sending DataTable over WCF then here’s the somewhat more efficient serialization technique you can use. The basic idea is to use binary serialization of DataTable and pass that serialized data as byte array along with the schema information so the client can reconstruct it on other end. It’s needless to say that doing this would invariably restrict your WCF clients to .Net so you might also want to include other web method for other clients.

public static void LightWeightSerialize(DataTable myDataTable, out byte[] serializedTableData, out string tableSchema)
{
    //Get all row values as jagged object array
    object[][] tableItems = new object[myDataTable.Rows.Count][];
    for (int rowIndex = 0; rowIndex < myDataTable.Rows.Count; rowIndex++)
    tableItems[rowIndex] = myDataTable.Rows[rowIndex].ItemArray;

    //binary serialize jagged object array
    BinaryFormatter serializationFormatter = new BinaryFormatter();
    MemoryStream buffer = new MemoryStream();
    serializationFormatter.Serialize(buffer, tableItems);
    serializedTableData = buffer.ToArray();


    //Get table schema
    StringBuilder tableSchemaBuilder = new StringBuilder();
    myDataTable.WriteXmlSchema(new StringWriter(tableSchemaBuilder));
    tableSchema = tableSchemaBuilder.ToString();
}

And here’s the deserializer to go with it:

public static DataTable LightWeightDeserialize(byte[] serializedTableData, string tableSchema)
{
    DataTable table = new DataTable();
    table.ReadXmlSchema(new StringReader(tableSchema));

    BinaryFormatter serializationFormatter = new BinaryFormatter();
    MemoryStream buffer = new MemoryStream(serializedTableData);
    object[][] itemArrayForRows = (object[][]) serializationFormatter.Deserialize(buffer);

    table.MinimumCapacity = itemArrayForRows.Length;
    table.BeginLoadData();
    for (int rowIndex = 0; rowIndex < itemArrayForRows.Length; rowIndex++)
    table.Rows.Add(itemArrayForRows[rowIndex]);
    table.EndLoadData();

    return table;
}

How efficient is this? It really depends on your data. For instance, with some of my test data with 10K rows I could get about 6X smaller payload size and 30% faster serialization. But as number of rows increases, the speed advantage diminishes compared to built-in XML serializer that you can access via ReadXml/WriteXml. For example, for a million row, above method still gives me 4X smaller payload but serialization is actually 3X slower than built-in XML serializer. So experiment before you go either way!

The Best Culture Invariant Format for DateTime

If you are looking to display how to display DateTime as text without causing confusion to users in different countries then good choices is either "o" or "r". The "o" format is in general more preferable as it also puts timezone offset.

long t = DateTime.Now.Ticks;
Console.WriteLine((new DateTime(t)).ToString("o"));
Console.WriteLine((new DateTime(t, DateTimeKind.Local)).ToString("o"));
Console.WriteLine((new DateTime(t, DateTimeKind.Unspecified)).ToString("o"));
Console.WriteLine((new DateTime(t, DateTimeKind.Utc)).ToString("o"));

Prints followings when actual date time is 2009-11-08T17:16:13.7791953 PST:

2009-11-08T17:16:13.7791953
2009-11-08T17:16:13.7791953-08:00
2009-11-08T17:16:13.7791953
2009-11-08T17:16:13.7791953Z

If you use "r" instead it would print followings:

Sun, 08 Nov 2009 17:26:02 GMT
Sun, 08 Nov 2009 17:26:02 GMT
Sun, 08 Nov 2009 17:26:02 GMT
Sun, 08 Nov 2009 17:26:02 GMT

Find Path of a Command Line Tool

Many times you work on different machines, execute a command line tool but often wonder where that tool is actually installed. One way to figure this out is to look at all environment PATH variables and search them manually in same order as Windows does. But you don’t have to because luckily there is a little known built-in command called WHERE that does that for you:

This is similar to Unix commands like WHICH and WHEREIS.

Selecting Random Row From SQL Server Table

It is important to make sure your automated tests covers various real-world data combinations (for instance, some columns could be null or some rows could be duplicate). For perf testing you want to reduce effects of caching by not firing same SQL over and over. In these cases, ability to select a random row for your test could come in handy and here’s neat little trick to do it:

select top 1 * from table order by newid()

Intel Xeon – the worse branding ever?

This is really frustrating if you didn’t knew. Intel likes to call many of their processors for “business class” systems “Xeon”. When they say Xeon it can mean anything from those dinosaurs based on Pentium II and the latest and greatest ones based on on Core i7. So when you see a system built with Xeon processor you really need to look up its exact subtle number and carefully study its specs to figure out what it really is.

For example, Xeon W3520 is same as latest Core i7-920, including the price. The only difference between two is Xeon supports ECC memory. So what’s the point in branding this processor as Xeon instead of Core 2 i7?

BTW, new Intel Core i7 series is definitely a winner and finally makes 2 year old Intel Core 2 series actually obsolete. This processor can execute 8 threads simultaneously and supports 3 channels for DDR3 access tripling the RAM bandwidth! Best of all, it costs more or less same as older Core 2 processors.

What’s in a name?

When you want to store the name of a person a typical design starts out by creating two fields (in database or class):

Person

First Name
Last Name

Soon you realize lot of people have middle name, especially, when name change occurs after marriages. So you go and add one more field:

Person

First Name
Middle Name 
Last Name

This is all good… until you encounter people in countries such as Spain and Cuba who have custom to have two last names. Both are equally important and both are required in any official document (including ones your website or app may print out). So you go in and add one more field while thinking this ought to do it once and for all:

Person

First Name
Middle Name 
Last Name
2nd Last Name

Not so fast… Lot of people from Hong Kong and few other places in Asia actually carry two first names. One of these first names is traditional while other is typically a Western/Roman name. Both first names are important and often many people will know only Western/Roman first name of a person although official documents would only refer to traditional names.

For example, consider name of Hong Kong’s Chief Secretary Anson Chan Fang On Sang. Here Anson is English given name, On Sang is Chinese given name, Chan is husband’s surname and Fang is her own surname.

So time to add few more field so we can store everybody’s names on planet without loss of semantics:

Person

Traditional Given Name
English Given Name
Middle Name 
Last Name
2nd Last Name

Ok… so are we done now? Well, almost! We are still missing at least two critical pieces of information: Salutation and Suffix.

Example of common salutations are Dr, Mr, Mrs, Mr. While salutations are quickly falling out of fashion it might be still required, for example, if you are printing out an official letter to your customer and don’t want to make it look very casual.

Example of common suffixes are Jr, Sr, III, IV etc. These are required in official/legal communication to avoid confusion with other family members of a person.

Person

Salutation
Traditional Given Name
English Given Name
Middle Name 
Last Name
2nd Last Name
Suffix

Now we have covered most of the globe. There are still two more nice-to-have fields if you want to make your customers happy: Phonetic Given Name and Phonetic Last Name. Remember the times when you call customer support and each time you have a guy struggling to say your name? These two fields would avoid those moments:

Person

Salutation
Traditional Given Name
English Given Name
Middle Name 
Last Name
2nd Last Name
Phonetic Given Name
Phonetic Last Name
Suffix

So there you have it. A structure that can store almost anybody’s name on planet while maintaining semantics of each component of a name.

Most applications won’t need to go to this extreme because it’s OK to just have one first name and one last name that correctly identifies a person for its purpose even if it’s culturally incorrectly and technically incomplete. However if you are in a business where legal implications are high or if any information loss about your customer is not tolerable then it’s good to think about these possibilities.

There are probably better solutions than giant structure like above just to store name of a person. Instead of having all these different fields you can simply have one free form field, say, Full Name and another field called Full Name Style which takes values indicating how different components of names are arranged:

Person

Full Name
Full Name Style

This structure will make searches for specific components of a name little difficult but it would extend well as your application grows around the planet.

Solving Shared Notebook Sync Issue With OneNote 2010

Since about 3 years we used Groove to share calendar, notes and files within family – until I discovered a feature in OneNote called “Shared Notebooks”. The Shared Notebooks are just like any other OneNote notebooks with a difference that they get synced with other people! If someone added new note or modified a note you get it next time and vice a versa. On conflicts it created new pages and also you can take automated backups. This feature requires either file share or SharePoint. So I’ve now got my personal SharePoint website on Internet (which costs $10 per year) to host our shared OneNotes as well as our shared calendar that gets synced in Outlook.

Unfortunately in OneNote 2010 Technical Preview, the sync stopped working because OneNote for some reason does not popup a dialog to ask for a password to connect to SharePoint website on Internet anymore. Very troublesome. But here’s the work around I’ve found:

  1. Right click on the Notebook, select Properties.
  2. Click on Change Location button.
  3. Type URL of your SharePoint website. This will popup password dialog.
  4. Cancel all dialogs and sync! It should work now.

Snoqualmie Falls Hike Analysis

Today was perhaps my 32nd hike at Snoqualmie Falls. I've adopted this trail as my daily hike routine. Today I brought my new Garmin Oregon 400t GPS with me and recorded the track. Fortunately there is a website (although not that good) called MotionBased which can slice and dice and analyze this track data and tell you lot of cool things.

Here's the MotionBased's analysis of Snoqualmie Falls hike.

Few interesting things:

  • This is a 1.4 miles hike with 475 ft elevation gain.
  • The hike takes about 35 minutes.
  • Ascent and descent speeds as well as times are roughly same (note that I do not made any stops except at ends of the trail).
  • My speed while ascending is 2.4 mph and vertical speed is 14 ft/min or 840 ft/hr. Not too bad.
  • This hike has average grade of 16%. This is just 1% more than max allowed by typical gym treadmills. At steepest point the trail is 30% grade while about the "safe" max limit on typical maintained trails.

Using my watch which can measure elevation I thought that was 300ft elevation hike. Obviously my watch can easily be 100ft off. Here's the elevation profile of this hike:

SnoqualmieHikeProfile

Treadmill Grade to Angle Conversion

Gym machines like treadmills and even some roads express the slop as "grade" in percentage. For example, most gym treadmills allow up to 15% grade. I've left going to gym since last few months and instead I'm doing this Snoqualmie Falls hike almost every other day (which is 0.5 miles one way and 300ft elevation). But occasionally I have to go to gym because of early sunsets. I usually put max grade (15%) and speed walk for one mile burning 350 cal. So naturally the question is how many feet I climbed?

For this I need to convert grade in % to angle. But what is "grade%"? Turns out it's ration of rise/run or in other words,

grade=\frac{tan(\theta)}{100}

So your vertical climb in feet is given by, \sin(\tan^{-1}(grade*100))*5280 miles

So by that calculation, at maximum grade on gym treadmill I climb 783.2 feet for each mile I walk. Not too bad.

 

Via conversation here.