Using R to load CSV and Excel file in Sql Server 2016

November 14, 2015 Leave a comment

It’s been more than two and half years I blogged last time .¬†Lot of changes personally and professionallyūüôā . Finally had some time to¬†learn and test some of the new features coming up in Sql Server 2016.

As soon as I heard about “Integration of R” as new features in Sql Server 2016 I couldn’t resist trying it out for some of the frequent and¬†important task I do when I have to analyze data . Loading .csv or .xlsx file data in Sql server. Loading .csv file and .xlsx in Sql Server was always possible through tsql (bulk insert) ¬†, SSIS or¬†Import/Export but there were always some hoops to go through when loading .xlsx type of data. With introduction of R in Sql Server I think we have one more easy way to load these type of files. Lets see how it works.

“Assuming you have already installed Sql Server 2016 CTP 3 and run all post installation configurations for R.”

I would load baseball.csv file in sql server first. With introduction of R in Sql Server 2016 we can execute R scripts directly from SSMS using sp sp_execute_external_script like this:

execute sp_execute_external_script
@language = N’R’
, @script = N’ setwd(”D:/csvdirectory”);
mytextvariable <- read.csv(“baseball.csv”,header = TRUE);
OutputDataSet <-;’
, @input_data_1 = N’ ‘
WITH RESULT SETS (([Team] Char(5) not null, [League] char(3) NOT NULL, [Year] int not NULL , Win int not null, Playoffs bit not null ));

I am using 3 parameters here:

@Language to specify the script language “R”.

@script is the parameter where all R code goes .. This is where I included my basic R  script. In this script I first change working directory to directory where baseball.csv file exists. read.csv() is R function to to read csv file . Finally I convert csv file data to data frame and pass it to OutputDataSet variable. Since we generate values using just the R script so we leave the input query string in @input_data_1 blank as this is required parameter. We can also type any valid sql statement. This will not be executed by R anyways.

Result sets is where I create column names . I simply created the same column name as in .csv file.  When this script executes the values defined by the WITH RESULT SETS statement are returned in the Results pane in SSMS . Here is the screen shot.



Now what if I want to load this data in table for further analysis or to join with other tables. I couldn’t find any published way to do this so thought of trying it out myself. First thing came to mind was Insert…Exec method for¬†loading stored procedure results in table. And after couple of dynamic sql and single quotes related issue it worked finallyūüôā

Here is how it works:

I created temp table to store data. This could be permanent table too.

create table #t ([Team] Char(5) not null, [League] char(3) NOT NULL, [Year] int not NULL , Win int not null, Playoffs bit not null );

After that I insert into temp table and execute R script as dynamic sql. it’s same script as above but executed¬†as dynamic sql like this:

Insert into #t
execute sp_execute_external_script
@language = N”R”
, @script = N” setwd(“D:/csvdirectory”);
mytextvariable <- read.csv(“baseball.csv”,header = TRUE);
OutputDataSet <-;”
, @input_data_1 = N” ;”
WITH RESULT SETS (([Team] Char(5) not null, [League] char(3) NOT NULL, [Year] int not NULL , Win int not null, Playoffs bit not null ));’

select * from #t;

Voila. Data is loaded in table. Now we can do all kind of analysis or use it whatever way we need to.

I am going to use it a lot to¬†automate .csv file or .xlsx file upload to table . I didn’t check with huge csv or xlsx files but I think some performance penalty is ok for my scenarios¬†looking at other benefits we have with this approach.

Loading .xlsx file requires to load xlsx library in the R script else it won’t work. I had to install xlsx package first into my R environment then it is same script as above with extra line of code to load xlsx library (highlighted in red).

Here is script to load excel files :

Insert into #t
execute sp_execute_external_script
@language = N”R”
, @script = N” setwd(“R:/DBA”);
mytextvariable <- read.xlsx(“baseball.xlsx”, sheetName=”Sheet1″);
OutputDataSet <-;”
, @input_data_1 = N” ;”
WITH RESULT SETS (([Team] Char(5) not null, [League] char(3) NOT NULL, [Year] int not NULL , Win int not null, Playoffs bit not null ));’

That’s it . I am really excited to see lot of cool features in Sql Server 2016 . ¬†Some of these features are definitely going to help me a lot in my day to day work . Couple of features I am looking forward to learn and work are R in Sql Server and load\query JSON format data .

Will continue to test and learn some more R related Sql Server features and blog here.





Categories: Sql Server Tags:

Load JSON to SQL Server – Challenges and Alternatives(MongoDB)

February 17, 2013 2 comments

Have you ever tried loading JSON type data into SQL Server? If yes then you would agree that it is not easy to get it done.
JSON (JavaScript Object Notation) is a lightweight data-interchange format. JSON is widely used everywhere now and mostly used by web developer to create stateful web applications.

Simplified example :

¬†¬†¬†¬†¬†¬†¬† “name” : “James”,
¬†¬†¬†¬†¬†¬†¬† “Age” : 50,
¬†¬†¬†¬†¬†¬†¬† “Favourite_Color” : “Purple”

In other terms JSON document is collection of key\value which can be nested to several lavels . JSON document(Every record is called document in JSON) can have several subdocuments with multiple level of arrays and lists.

So now for one new project we had a requirement to load vendor data into sql server.This vendor data was in JSON format with really complex structure and several level of nesting. We were getting around 20 JSON files daily from vendor that I was needed to load in DB.

There is no easy way to load JSON data in Sql Server. No in-built method\function or SSIS component for this task. I bingoogle to see how others are doing it since this format is widely used.
Didn’t find anything except one simple-talk blog post from Phil Factor. Post

Really good article. Author created function using TSQL for complex string manipulation to parse JSON type data. I tried it first with some sample data which worked fine but data sent to us was really huge (Some of the files were around 1 GB) with crazy complex data nesting.
Process kept failing with one or other error .So first I tried to modify and enhance this function so that it works for my dataset but file structure was quite complex to make it work.

I have no idea why Sql Server doesn’t provide easy way to do it .. I see there is connect case opened for this feature with overwhelming support but nothing from MS yetūüė¶ I also commented on it a while back. Please upvote if you are struggling to load JSON data into Sql Server.
Here is connect item link

Next I tried c# to convert JSON data to xml and then use ssis to load it in DB. .net have JSON serializer and deserializer classes available for this but I used NewtonSoft JSON extension which really made it easy. Once file is converted to xml then ssis loaded it to DB.

Well this worked but (there is always a butūüôā ) setting up xml load wasn’t easy. First conversion to xml was slow and then one of the converted xml file was loading around 19 tables . So you can guess the nesting here. We were getting different files every now and then which made this process difficult to maintain but we had no option but to live with it with quite a bit of manual intervention every now and thenūüė¶

Now lets talk about other alternates(NoSQL).
There are several NoSql platform that allow loading these JSON files easy. MongoDB, CouchDB etc.
After all the dust settled down i started looking into some of these options. MongoDB for one is really easy to set up and work with. I won’t go into installations and set up as it is easily available on MongoDB website. Once MongoDB is installed then loading JSON data is piece of cake.
Now to load JSON file in customer DB i would run one single line command.

At command prompt in bin directory:

>mongoimport –database orders –collection ordDetail < c:\orders.json

All done. File loaded. If db (orders in my case) doesn’t exist already then above mongoimport command will create database. Collection represents table in MongoDB and if
collection(ordDetail in my case) doesn’t exist already then it will be created too. MongoDB is flexible schema database system which works great for json, csv and tsv format files.

Once file is loaded then here are some basic MongoDB queries to check data.

Count how many documents loaded or total number of documents in ordDetail collection. This is equivalent to select count(*) from table in relational db’s.


To fetch one row\document from collection.


Sample output after masking data.

¬†¬†¬†¬†¬†¬†¬† “_id” : ObjectId(“511dc7e3476eee41aa7a073a”),
¬†¬†¬†¬†¬†¬†¬† “name” : “Burroug’s”,
¬†¬†¬†¬†¬†¬†¬† “OrderNumber” : “9187157999-3909592”,
¬†¬†¬†¬†¬†¬†¬† “Amount” : 48.4,
¬†¬†¬†¬†¬†¬†¬† “impression” : 0,
¬†¬†¬†¬†¬†¬†¬† “customField1” : “jhhgh10”,


To query everything: (Select * from table )


I would blog about MongoDB query commands in future posts.

Application owner wanted to move application to MongoDB now but project was put on hold as there are several other challenges comes into picture when working with NoSQL dbs.

Argument continuesūüôā

To conclude here is my journey to try to load JSON in Sql Server. Quite a few other RDBMS already have functionality to load JSON data . I hope MS would add this feature in Sql Server soon as .Net framework already have several JSON related classes.


Categories: Uncategorized Tags: , ,

Load Excel Spreadsheet to table.Tool that I build

October 15, 2012 Leave a comment

Very often we are asked to load data from excel spreadsheets to DB for data analysis or some data fix etc. There are several out of the box ways to load data in db like
SSIS , bulk insert and now powershell.  All these methods works fine and do the job but it is quite a bit of learning overhead for non-db guys . Other problem is permission
, as almost every organization developers do not have write access beyond Dev env. So it comes to DBA’s to do the job which becomes PITA soon as request keeps pouring from
developersūüôā . To avoid these situations and help developers load data from excel to table in DB I build one small tool in C# .

We allow developer to load data from excel to tables in one specific db which can later be used to analyze data by joining with primay DB. We build process to clean up these temp tables after 15 days.

Here is screenshot :


All the fields in above form are self expalanatory. I am using 2 different provider here “Microsoft.Jet.OLEDB.4.0” and “Microsoft.ACE.OLEDB.12.0” ..If you don’t have
latest version of MS Office( 2010 ) then it might be the case that you don’t have latest driver. You need to download and install AccessDatabaseEngine_x64.exe from below¬†location.

If you don’t want to download and install latest drive than you need to select “Live in Cave”(No offense :-)) option. This would make tool use Jet Ole DB provider for loading excel file.But there is one caveat with this option selected. You can’t load excel files with .xlsx extension . With .xlsx extension excel files are stored as xml which Jet Ole DB won’t recognize. Work around would be to¬†open .xlsx file and save it as .xls ..You should be able to use tool to load .xls file then.

If you don’t select option “Existing Table” then this tool will create new table in db with all the columns as Varchar(20). If first row in excel sheet doesn’t have¬†column name then arbitrary column names are used (F1, F2 etc.) for new table..

Since we have db’s in multiple domain with no trust between them so we allow only sql¬†authentication to load data . I disabled Windows Authentication in tool for now which I would enable in next version for general use..

This tool has really helped DBA’s and Developers to quickly get data in DB for data fix or analysis so give it a shot and see if it help you the way it did to us.

Disclaimer :- This tool has been tested and used on several machines successfully but still there is no guarantee that it would work fine on all the machines since lot of env related factors play role during execution. Though I have not seen any security related issues while using this but use in production db is not recommended.

App can be downloaded from here 

Happy Loading


Categories: Sql Server, Tool

Find and Replace string using Powershell

August 23, 2012 Leave a comment

This would a quick post on find and replace text in ssis config files using powershell. Recently we upgraded and moved our db’s which required to modify
config files to point to new server names. Since we are using xml config file so it is tedious to go and check every config file and modify server name.

If you have 1-2 config files then it is not a problem but if you more than 50 then it doesn’t make any sense going into each file and modifying it manually.

Again Powershell to the rescue.. Powershell makes it super easy to do this in one line of code.

Here is powershell script :

get-childitem -recurse -include “*.dtsconfig” | select-string “OLDSERVERNAME” -list | % { (get-content $_.path) | % { $_ -replace “OLDSERVERNAME” , “NEWSERVERNAME” } | set-content $_.path | write-host “Processed :- “+$_}

-recurse is used to recurse through all the folders in directory.
-include is used to look into file with “dtsconfig” extension only.

If you just want to get file name that will be modified . You can run below command.

get-childitem -recurse -include “*.dtsconfig” | select-string “OLDSERVERNAME” | select Path

This would search for string and print complete path and name of  file.

You can also group by filename and find out number of times that string matches in each file.

get-childitem -recurse -include “*.dtsconfig” | select-string “OLDSERVERNAME” | group-object path | select count , Name | format-table -auto

So here it is . Powershell again makes life easy.


Categories: PowerShell, SSIS

Table MetaData for Query Tuning – Quick and Dirty

August 7, 2012 Leave a comment

Important part of life as DBA is responding qucickly during any disaster or performance issues. As much it sucks it is really challenging and interesting at times . You have to be ready always with all the tools\scripts in arsenal to identify and troubleshoot problem. So this entry again related to tuning as has been some of my previous posts.

Several times I get scripts from developer with subjects “running slow. please check” . While looking into it I try to get as much information about some¬†basic stats related to tables in query. This information includes index fragmentation , index stats info, indexes and indexed columns. If query is small and includes 1\2 tables then this tasks is easier as just select table and hit shortcut(I heavily use shortcuts) that presents all needed¬†data. But if query is complex and involve several tables then it becomes tiresome to get this information for each and every table in same output.

I created below process to take care of that. Created SP that takes sql text as parameter which executes query text using sp_executesql ( tip- using set fmtonly on to discard any output). This would cache the plan. Finally I would query cached plan for recently executed sql text to get table names . Once I have table names I query metadata views to grab all the needed information.

Here is how it works.

Lets create sample tables and load some dummy data.

Create table Tableone(Id int identity(1,1) primary key ,item int ,dt datetime)
Create Table Tabletwo(Id int primary key ,price decimal(10,3) , dt1 datetime)

Below I am using Merge to load random data in both tables simultaneously.

Declare @i int , @d int
set nocount on
Set @i = 1

While @i <= 10000
Merge TableOne as target
Using (values (1)) as src (val)
on 1=2
when not matched
then insert (item , dt) values (abs(checksum(rand())),dateadd(second,abs(checksum(rand()))/10,’20060101′))
output,cast(abs(checksum(rand()))/100000000.0 as decimal(10,3)) , inserted.dt
into Tabletwo;
set @i +=1

Create couple of indexes on both the tables to test our meta data query output.

create nonclustered index ncx_tableone on tableone(item ,dt)
create nonclustered index ncx_tabletwo_incl on tabletwo(dt1 desc ,price) include(id)

Finally Create below SP to execute sql text and query cache to fetch table related meta data.

CREATE Proc nsp_getTableMetaData @sql nvarchar(4000)
Set nocount on

Declare @sql1 nvarchar(4000) = @sql

–Execute sql text to generate query plan

Exec sp_executesql @sql1

into #t from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(plan_handle) qt
cross apply sys.dm_exec_query_plan(plan_handle) qp
where qt.text like ‘%’+@sql1 +’%’

–Store table names in temp table for meta data query.
;with xmlnamespaces (default ‘;)

select distinct tbl.value(N’@Table’,N’sysname’) as [TableName] into #t2 from #t
cross apply #t.query_plan.nodes(‘//RelOp[./*/Object/@Database]’) as qp(obj)
cross apply qp.obj.nodes(‘./*/Object’) as obj(tbl)

Select x.[TableName] as TableName , as index_name,STATS_DATE(i.object_id, i.index_id) AS statistics_update_date
,f.avg_fragmentation_in_percent,stuff(IndexCols.Colname , 1,1,”) ,stuff(IndexIncCols.Colname , 1,1,”), f.page_count
from sys.indexes i
Inner Join (select [TableName] from #t2) as x on i.object_id = Object_id(x.TableName)
Cross apply dbo.my_index_physical_stats(DB_ID(),Object_id(x.TableName),i.index_id,null,NULL) as f 
CASE inc.is_descending_key
FROM sys.index_columns inc
JOIN sys.columns cl ON inc.object_id = cl.object_id
AND inc.column_id = cl.column_id
WHERE inc.object_id = i.object_id
AND inc.index_id = i.index_id
AND inc.is_included_column = 0
ORDER BY inc.key_ordinal
FOR XML PATH(”)) AS IndexCols(Colname)
FROM sys.index_columns inc
JOIN sys.columns cl ON inc.object_id = cl.object_id
AND inc.column_id = cl.column_id
WHERE inc.object_id = i.object_id
AND inc.index_id = i.index_id
AND inc.is_included_column = 1
ORDER BY inc.index_column_id
FOR XML PATH(”)) AS IndexIncCols(Colname)
where i.type in (1,2)


I noticed that I can’t use sys.dm_db_index_physical_stats directly in query with cross apply etc. It was throwing some correlated subquery error. Binged
and found a way around in old post from Paul Randal( where Paul blogged about working around this by creating Table Values Function referring 
sys.dm_db_index_physical_stats. That solution was for 2005 version . For 2008 you just need to add one extra column in TVF. You can check above link to read more.

Here is copy of function :

CREATE FUNCTION my_index_physical_stats (
@database_id INT,
@object_id INT,
@index_id INT,
@partition_number INT,
@mode INT)
database_id SMALLINT NULL, object_id INT NULL, index_id INT NULL, partition_number INT NULL,
index_type_desc NVARCHAR(60) NULL, alloc_unit_type_desc NVARCHAR (60) NULL, index_depth TINYINT NULL,
index_level TINYINT NULL, avg_fragmentation_in_percent FLOAT NULL, fragment_count BIGINT NULL,
avg_fragment_size_in_pages FLOAT NULL, page_count BIGINT NULL, avg_page_space_used_in_percent FLOAT NULL,
record_count BIGINT NULL, ghost_record_count BIGINT NULL, version_ghost_record_count BIGINT NULL,
min_record_size_in_bytes INT NULL, max_record_size_in_bytes INT NULL, avg_record_size_in_bytes FLOAT NULL,
forwarded_record_count BIGINT NULL,compressed_page_count BIGINT NULL)
sys.dm_db_index_physical_stats (@database_id, @object_id, @index_id, @partition_number, @mode)

Finally you can test it by running below queries. Add set fmtonly option if you don’t want to see query output.

Exec nsp_getTableMetaData ‘ set fmtonly on;
declare @id int = 4 select , t1.item , t2.price , t2.dt1
from tableone t1 inner join tabletwo t2 on =
where = @id ; set fmtonly off;’

You can even run this with parameter values.

Exec nsp_getTableMetaData ‘declare @id int = 4 select , t1.item , t2.price , t2.dt1
from tableone t1 inner join tabletwo t2 on =
where = @id;’


Exec nsp_getTableMetaData ‘select , t1.item , t2.price , t2.dt1
from tableone t1 inner join tabletwo t2 on =
where = 6 ;’

This has helped me a lot with my day to day query performance troubleshooting . It gets me basic troubleshooting information fast enough to fix it or dig deeper in query if nothing wrong with indexes or stats.

I would enhance it to include other statistics related to query like logical reads etc.

Check this out and let me know what do you think.


Categories: Uncategorized Tags:

Script Table Index in SSMS Using Powershell

July 17, 2012 Leave a comment

Most important part of DBA’s job is to regurlarly tune db queries .When you are tuning queries you want to know indexes created on table with column names etc . This tip is all about getting that information from DB quickly.

There are several ways to get this information. You can expand table in ssms and script indexes manually which becomes tedious locating table if you have thousands of tablesūüôā as in my case or you can create your own sp querying system tables and set it up as shortcut key which works great but as I have been playing with powershell lately so I tried to do same using powershell. But there is one small caveat. If you want to run powershell script from sql server you will have to use xp_cmdshell command. xp_cmdshell is disabled by default so you may need to enable it first by running below command :

Exec sp_configure ‘xp_cmdshell‘ , 1;


Reconfigure with override


I created below powershell script and saved it on server and saved it as script_indexes.ps1 .

param([string]$Server,[string]$database,[string]$table) ¬†[reflection.assembly]::LoadWithPartialName(“Microsoft.SqlServer.Smo”) | out-null ¬†$srv =new-object Microsoft.SqlServer.Management.Smo.Server($Server) ¬†$db = $srv.databases[$database] ¬†$tb = $db.tables | where-object {$_.Name -like $table} ¬† foreach ($index in $tb.Indexes) ¬†¬† { ¬†¬†¬†¬† $index.script() + “GO `r`n ” ¬†¬† }

Finally create SP in DB which calls this powershell script and output index script in result window of SSMS.. For me this has been really fast way to see columns in index including included columns. Also it is easy if you want to modify indexes as you already have create index script ready. Here is sp code.


Sample Call — Exec nsp_script_indexes ‘TableName


CREATE PROCEDURE nsp_Script_Indexes

@TbName sysname





DECLARE @Servername AS sysname,@DbName AS sysname;

— Temp table to store powershell command output.



id           INT           IDENTITY,

Index_Script VARCHAR (MAX)


SET @Servername = @@SERVERNAME;

SET @DbName = DB_NAME();

—Change path for powershell file location for your server.

SET @cmd = ‘PowerShell.exe “G:\PowerShell\script_indexes.ps1 ‘ + ‘”‘ + @Servername + ‘“‘ + ‘ “‘ + @DbName + ‘“‘ + ‘ “‘ + @TbName + ‘“‘ + ‘”‘;

–Print @cmd

INSERT INTO #output (Index_Script)

EXECUTE xp_cmdshell @cmd;

–Delete unnecessary data from temp table.

DELETE #output

WHERE  Index_Script IS NULL

OR Index_Script = ”;

–Use ‘GO’ word in script to find end of each create index.

SELECT row_number() OVER ( ORDER BY ID) AS rn,



INTO   #opt

FROM   #output

WHERE¬† ltrim(rtrim(Index_Script)) = ‘GO‘;

DECLARE @i AS INT, @c AS INT, @ind AS NVARCHAR (1000), @d AS INT, @n AS INT;

SET @d = 0;

–Finally concatenate index script back and print.

SELECT @i = MIN(rn),

@c = max(rn)

FROM   #opt;

WHILE (@i <= @c)


SET @n = @i;


FROM   #opt

WHERE  rn = @i;

SELECT @ind = (SELECT CONVERT (VARCHAR (1000), Index_Script + ”)

FROM   #output

WHERE  id < @n

AND id > @d

FOR¬†¬†¬† XML PATH (”));

SET @d = @n;

SET @i = @i + 1;

PRINT @ind + char(10) + ‘GO‘;



Now you can add this SP call as an shortcut key in SSMS and you are ready to quickly get index info while troubleshooting queries.

Sample Call —¬† Exec nsp_script_indexes ‘TableName’

Happy tuning..


Categories: Sql Server

Avoid Trigger Execution – Special Cases

June 12, 2012 Leave a comment

Quick blog post on how to avoid trigger execution during some adhoc activity.

Today I needed to update around 12K rows in table which further had¬†after update trigger. As a special case we didn’t want trigger to be executed after update. I knew I could do it by creating temp table in update ¬†statement and then alter trigger to return if that temp table exists.. We tried that and it worked great.

But I still binge’d to see if there is¬†any other way I could accomplish this . I saw same question in one forum and saw Context_Info() function mentioned as answer. But didn’t see any¬†example code in the post so just for testing I ran couple of script to verify it and it works great too and also much cleaner way to get it done.

Scripts :



INSERT INTO trtable (ID, gn)

SELECT 1, ‘ab’

SELECT 2, ‘bc’

SELECT 3, ‘cd’


–SELECT *¬†FROM trtable

Trigger Script :

ON trtable
declare @c varbinary
set @c = CONTEXT_INFO()

If @c = 0x01

EXEC msdb.dbo.sp_send_dbmail
@profile_name = ‘local’,
@recipients = ‘’,
@body = ‘Trigger executed.’,
@subject = ‘blah-blah-and-blah’;

Now if you run below dml on trtable it won’t execute trigger code (In my trigger email part).

INSERT INTO trtable(ID, Gn)

SELECT 4, ‘ef’

So here it is. Another way to skip trigger execution in some special cases. I didn’t get time to do further testing about behaviour in explicit¬†transaction
or behaviour in other type of triggers. Will try to check it out if got time.


Categories: Uncategorized