Recently I had to solve a problem while importing a relatively large .CSV file to SQL Server from a C# service application code. This had to be done without using an ETL tool like SSIS. The .CSV files would come in the form of a Zip file uploaded by an ASP.NET MVC Web application user.
Some sample rows of the data with the header row in the .CSV file are as below
Id, Some Description field, Another field, Yet another field, A Guid field
1046348264289,"Comma, in this field",no comma here,,48B6B3FE-C2E1-4B01-AB4E-
1046348264289,"Comma, in this field",no comma here,,58B6B3FE-C2E1-4B01-AB4E-
1046348264277,"Comma, in this field",no comma here,,48B6B3FE-C2E1-4B01-AB4E-
The requirement was to capture only the first and last fields in every row. There could be thousands of rows in a single file, and the zip file could consist of many such files.
I initially started off by parsing the uploading file in C# and capturing only the required fields in a structure, that I would later turn into one big concatenated string, sending it down to a stored procedure for further parsing and inserting in the target table. I started off by writing the parsing routine and data structure used to capture the data below.
public static Dictionary<string, List<string>> ParseCsv(Stream stream)
{
var mapDictionary = new Dictionary<string, List<string>>();
using (var parser = new TextFieldParser(stream))
{
stream.Position = 0;
parser.Delimiters = new string[] { "," };
parser.ReadLine(); //read first line to advance and ignore the header fields
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
if (fields == null) continue;
List<string> guids;
if (!mapDictionary.TryGetValue(fields.FirstOrDefault(), out guids))
{
guids = new List<string>();
guids.Add(fields.LastOrDefault());
mapDictionary.Add(fields.FirstOrDefault(), guids);
}
else
{
guids.Add(fields.LastOrDefault());
}
}
}
return mapDictionary;
}
I would later use the routine below to create the string to send down the stored procedure
private void SaveMappings(){var mapped = new StringBuilder();foreach (var pair in _mapDictionary){var guidsSb = new StringBuilder();foreach (var item in pair.Value){guidsSb.Append(String.Format("{0},", item));}var guids = guidsSb.ToString().TrimEnd(',');mapped.Append(String.Format("{0},{1}|", pair.Key, guids));//1046348264289,48B6B3FE-C2E1-4B01-AB4E-90FBC98E1F81,58B6B3FE-C2E1-4B01-AB4E- 90FBC98E1F82| }//process all the mappings in one call to the backend_repository.RunStoredProcedure(mapped.ToString());}
However, this turned out to be overtly complex in processing the data in the stored procedure and had to use a recursive string split table valued function inside the stored procedure as well. Not very performant and error prone to say the least.
The Good News is from Sql Server 2008, I could simply make use of the User Defined Data Table Type and send down the data I need to insert in the back-end as a temporary table!! No complex string parsing in the stored procedure anymore.
The User Defined Data table Type method:
First, simply define the table structure you need in SQL like so
Next, simplify the earlier C# .CSV parsing routine like below. Note the use of the ADO.NET DataTable.CREATE TYPE [dbo].[ItemTopicMap] AS TABLE([ItemIdentifier] [bigint] NOT NULL,[TopicId] [uniqueidentifier] NOT NULL)GO
public static DataTable ParseCsv(Stream stream){var dt = new DataTable();dt.Columns.Add("ItemIdentifier", typeof(string));dt.Columns.Add("TopicId", typeof(Guid));using (var parser = new TextFieldParser(stream)){stream.Position = 0;parser.Delimiters = new string[] { "," };parser.ReadLine(); //read first line to advance and ignore the header fieldswhile (!parser.EndOfData){string[] fields = parser.ReadFields();if (fields == null) continue;if (fields.FirstOrDefault().Length > 10 &&IsValidGuid(fields.LastOrDefault())){dt.Rows.Add(fields.FirstOrDefault(), fields.LastOrDefault());}else{//log line out to bad file with filename and linenumber}}}return dt;}
Then inside the repository's execute stored procedure method, do something like this
public void RunStoredProcedure(DataTable dataTable){var conStr = ConfigurationManager.ConnectionStrings["DefaultConn"].ConnectionString;using (var connection = new SqlConnection(connectionString)){connection.Open();var proc = new SqlCommand("InsertItemTopics", connection);proc.CommandType = CommandType.StoredProcedure;proc.Parameters.AddWithValue("@ItemTopicMap", dataTable);proc.Parameters["@ItemTopicMap"].TypeName = "ItemTopicMap";proc.CommandTimeout = 180;proc.ExecuteNonQuery();connection.Close();}}
and Voila, no more further parsing from a huge string in the back-end!!
Note that the data table type defined here is used as some sort of temporary table on the back-end that we can read from in a stored procedure for any further validation and ultimately inserting or upserting the data in the actual destination table.
For completeness, a simplified version of the stored procedure InsertItemTopics is shown below:
CREATE PROCEDURE [dbo].[InsertItemTopics]( @ItemTopicMap ItemTopicMap READONLY )ASBEGINSET NOCOUNT ONTRUNCATE TABLE ItemTopicMapDestinationINSERT INTO ItemTopicMapDestination(ItemIdentifier, TopicId)SELECT itemIdentifier, TopicId FROM @ItemTopicMapEND