Saturday, January 11, 2014

Using a User Defined Table Type as an alternative to Sql Bulk Insert in SQL Server


Recently I had to solve a problem while importing a relatively large .CSV file to SQL Server from a C# service application code. This had to be done without using an ETL tool like SSIS. The .CSV files would come in the form of a Zip file uploaded by an ASP.NET MVC Web application user.

Some sample rows of the data with the header row in the .CSV file are as below

Id, Some Description field, Another field, Yet another field, A Guid field
1046348264289,"Comma, in this field",no comma here,,48B6B3FE-C2E1-4B01-AB4E-90FBC98E1F81
1046348264289,"Comma, in this field",no comma here,,58B6B3FE-C2E1-4B01-AB4E-90FBC98E1F82
1046348264277,"Comma, in this field",no comma here,,48B6B3FE-C2E1-4B01-AB4E-90FBC98E1F81

The requirement was to capture only the first and last fields in every row. There could be thousands of rows in a single file, and the zip file could consist of many such files.

I initially started off by parsing the uploading file in C# and capturing only the required fields in a structure, that I would later turn into one big concatenated string, sending it down to a stored procedure for further parsing and inserting in the target table. I started off by writing the parsing routine and data structure used to capture the data below.

        public static Dictionary<string, List<string>> ParseCsv(Stream stream)
        {
            var mapDictionary = new Dictionary<string, List<string>>();
            using (var parser = new TextFieldParser(stream))
            {
                stream.Position = 0;
                parser.Delimiters = new string[] { "," };

                parser.ReadLine(); //read first line to advance and ignore the header fields

                while (!parser.EndOfData)
                {
                    string[] fields = parser.ReadFields();

                    if (fields == null) continue;

                    List<string> guids;
                    if (!mapDictionary.TryGetValue(fields.FirstOrDefault(), out guids))
                    {
                        guids = new List<string>();
                        guids.Add(fields.LastOrDefault());
                        mapDictionary.Add(fields.FirstOrDefault(), guids);
                    }
                    else
                    {
                        guids.Add(fields.LastOrDefault());
                    }              
                }
            }

            return mapDictionary;
        }


I would later use the routine below to create the string to send down the stored procedure

        private void SaveMappings()
        {
            var mapped = new StringBuilder();
            foreach (var pair in _mapDictionary)
            {
                var guidsSb = new StringBuilder();
                foreach (var item in pair.Value)
                {
                    guidsSb.Append(String.Format("{0},", item));
                }
                var guids = guidsSb.ToString().TrimEnd(',');
                mapped.Append(String.Format("{0},{1}|", pair.Key, guids)); 
//1046348264289,48B6B3FE-C2E1-4B01-AB4E-90FBC98E1F81,58B6B3FE-C2E1-4B01-AB4E-90FBC98E1F82|
            }
            //process all the mappings in one call to the backend
            _repository.RunStoredProcedure(mapped.ToString());
        }

However, this turned out to be overtly complex in processing the data in the stored procedure and had to use a recursive string split table valued function inside the stored procedure as well. Not very performant and error prone to say the least.

The Good News is from Sql Server 2008, I could simply make use of  the User Defined Data Table Type and send down the data I need to insert in the back-end as a temporary table!! No complex string parsing in the stored procedure anymore.


The User Defined Data table Type method:

First, simply define the table structure you need in SQL like so

CREATE TYPE [dbo].[ItemTopicMap] AS TABLE(
        [ItemIdentifier] [bigint] NOT NULL,
        [TopicId] [uniqueidentifier] NOT NULL
)
GO
Next, simplify the earlier C# .CSV parsing routine like below. Note the use of the ADO.NET DataTable.

        public static DataTable ParseCsv(Stream stream)
        {
            var dt = new DataTable();
            dt.Columns.Add("ItemIdentifier", typeof(string));
            dt.Columns.Add("TopicId", typeof(Guid));
            using (var parser = new TextFieldParser(stream))
            {
                stream.Position = 0;
                parser.Delimiters = new string[] { "," };
                parser.ReadLine(); //read first line to advance and ignore the header fields
                while (!parser.EndOfData)
                {
                    string[] fields = parser.ReadFields();
                    if (fields == null) continue;
                    if (fields.FirstOrDefault().Length > 10 &&
IsValidGuid(fields.LastOrDefault()))
                    {
                        dt.Rows.Add(fields.FirstOrDefault(), fields.LastOrDefault());
                    }
                    else
                    {
                        //log line out to bad file with filename and linenumber
                    }
                }
            }
            return dt;
        }

Then inside the repository's execute stored procedure method, do something like this

  public void RunStoredProcedure(DataTable dataTable)
  {
       var conStr = ConfigurationManager.ConnectionStrings["DefaultConn"].ConnectionString;
            using (var connection = new SqlConnection(connectionString))
            {
                connection.Open();
                var proc = new SqlCommand("InsertItemTopics", connection);
                proc.CommandType = CommandType.StoredProcedure;
                proc.Parameters.AddWithValue("@ItemTopicMap", dataTable);
                proc.Parameters["@ItemTopicMap"].TypeName = "ItemTopicMap";
                proc.CommandTimeout = 180;
                proc.ExecuteNonQuery();
                connection.Close();
            }
   }

and Voila, no more further parsing from a huge string in the back-end!!

Note that the data table type defined here is used as some sort of temporary table on the back-end that we can read from in a stored procedure for any further validation and ultimately inserting or upserting the data in the actual destination table.

For completeness, a simplified version of the stored procedure InsertItemTopics is shown below:

CREATE PROCEDURE [dbo].[InsertItemTopics]
( @ItemTopicMap ItemTopicMap READONLY )
AS
BEGIN
    SET NOCOUNT ON
        TRUNCATE TABLE ItemTopicMapDestination
        INSERT INTO ItemTopicMapDestination(ItemIdentifier, TopicId)
                SELECT itemIdentifier, TopicId FROM @ItemTopicMap
END


Saturday, January 4, 2014

The JavaScript Module Pattern


It allows private members that can be used to maintain state using closure, and exposing only public members. So only a public API is exposed, keeping everything within the closure (nested function using the private variables) as private. This pattern is similar to an IIFE, save that an object is returned instead of result of a function execution.

Note that JavaScript don't have getters and setters or access modifiers, so we use closures and function scopes to simulate private members and maintain state inside a module.

//Shopping Cart Module
var shoppingCart = (function(){
      var cart= [];  //private
      
      //public API
      return {
            addItem: function(value) {
                    cart.push(value);
            },
            itemCount: function() {
                   return cart.length;
           },
           totalPrice: function() {
                    var i = this.itemCount(),
                               p = 0;

                    while(i--) {
                          p += cart[i].price;
                    }
                   return p;
           }
     }      
}());

The shopping cart module is self-contained in a global object called shoppingCart. The cart array is private and only exists within the module's closure. The addItem and itemCount methods can access the cart array as it is within the scope of the outer function.

The module returns an object that gets assigned to shoppingCart as it is immediately invoked. this way we can interact with the shoppingCart object as follows:

//call an API method of the module
shoppingCart.addItem({item:'Effective JavaScript',price:8.99});

//this will work
console.log(shoppingCart.itemCount());
console.log(shoppingCart.totalPrice());

//this will return undefined
console.log(shoppingCart.cart);

Notice that the methods defined in the shoppingCart module have been effectively namespaced with shoppingCart and "avoids" global collisions. Of course, the issue of global variable declaration still remains if we have another global variable called shoppingCart declared.

Also, an object literal can be used to define a module by just assigning it to a variable! This is sufficient if we don't need to maintain state or require private variables. Object literals cannot instantiate using the new operator and we can simply start using the object declared as an instance.

Why use Modules in JavaScript?


The short answer : Mimic classes and code organization!

It mimics classes for code encapsulation - not inheritence per se. 
Javascript has prototypal inheritence, where we can dynamically add members to the prototype property of the class/object we define. The members added are shared across all instances of classes created AFTER we have defined the prototype members - akin to static members. The members defined if any, in the class(function) itself belong to individual instances with reference to the "this" keyword.

A module can be a distinct unit of functionality, like a navigation module, a filter module, an email notification module etc. We can think of it as a JavaScript equivalent of a .NET component. Decomposing your code into modules helps maintainability of your application by keeping them loosely coupled and removing dependencies where possible.
  • Allows encapsulation of private members giving it an object-oriented flavor
  • Avoids function name clashes with other parts of the JavaScript code
  • So helps in code organization
  • By immediately invoking the function expression, returns only the object container or function with the values of the private members remembered through closure. So returns only a public API in the forma of an object, keeping everything else private using closure.
  • Very similar to IIFE, except an object is returned, rather than a function.
Sample module to avoid global scope contamination using Closure and the Revealing Module Pattern:

var module = (function () {
   var privateVariable; //private member

  /* a private function uses the private variable in a closure .. (read nested function) 
      Allows the private member to be declared (and initialized) only once and exposed indirectly
      without  polluting the global namespace */
   var privateFunction = function (x) {
         /* uses the private variable */
        };
  
   /* Another private function - just named public here because we are going to expose it outside
      the module defintion */
   var publicFunction = function () {
         privateFunction (); /* Can call the private function */
   };

   /* returns an object literal */    return {
      PublicFunction : publicFunction  /* exposed as a public function here! */
  } 
}()); /* Outer function is immediately executed here */

/* Note that this is a Singleton pattern as it is immediately executed here, and we cannot create
   multiple instances of this module */

In Modern JavaScript design patterns, these 4 object-oriented design patterns

  1. Prototype Pattern
  2. Module Pattern
  3. Revealing Module Pattern
  4. Revealing Prototype Pattern

can be implemented using

  • AMD - allows modules to be loaded asynchronously without care for script loading order
  • CommonJS module - Module dependencies have to worry about the order while loading script in browser
  • ECMAScript Harmony - the next version of JavaScript!