Archive

Archive for the ‘Uncategorized’ Category

Load JSON to SQL Server – Challenges and Alternatives(MongoDB)

February 17, 2013 2 comments

Have you ever tried loading JSON type data into SQL Server? If yes then you would agree that it is not easy to get it done.
JSON (JavaScript Object Notation) is a lightweight data-interchange format. JSON is widely used everywhere now and mostly used by web developer to create stateful web applications.

Simplified example :

{
        “name” : “James”,
        “Age” : 50,
        “Favourite_Color” : “Purple”
}

In other terms JSON document is collection of key\value which can be nested to several lavels . JSON document(Every record is called document in JSON) can have several subdocuments with multiple level of arrays and lists.

So now for one new project we had a requirement to load vendor data into sql server.This vendor data was in JSON format with really complex structure and several level of nesting. We were getting around 20 JSON files daily from vendor that I was needed to load in DB.

There is no easy way to load JSON data in Sql Server. No in-built method\function or SSIS component for this task. I bingoogle to see how others are doing it since this format is widely used.
Didn’t find anything except one simple-talk blog post from Phil Factor. Post

Really good article. Author created function using TSQL for complex string manipulation to parse JSON type data. I tried it first with some sample data which worked fine but data sent to us was really huge (Some of the files were around 1 GB) with crazy complex data nesting.
Process kept failing with one or other error .So first I tried to modify and enhance this function so that it works for my dataset but file structure was quite complex to make it work.

I have no idea why Sql Server doesn’t provide easy way to do it .. I see there is connect case opened for this feature with overwhelming support but nothing from MS yet 😦 I also commented on it a while back. Please upvote if you are struggling to load JSON data into Sql Server.
Here is connect item link

Next I tried c# to convert JSON data to xml and then use ssis to load it in DB. .net have JSON serializer and deserializer classes available for this but I used NewtonSoft JSON extension which really made it easy. Once file is converted to xml then ssis loaded it to DB.

Well this worked but (there is always a but 🙂 ) setting up xml load wasn’t easy. First conversion to xml was slow and then one of the converted xml file was loading around 19 tables . So you can guess the nesting here. We were getting different files every now and then which made this process difficult to maintain but we had no option but to live with it with quite a bit of manual intervention every now and then 😦

Now lets talk about other alternates(NoSQL).
There are several NoSql platform that allow loading these JSON files easy. MongoDB, CouchDB etc.
After all the dust settled down i started looking into some of these options. MongoDB for one is really easy to set up and work with. I won’t go into installations and set up as it is easily available on MongoDB website. Once MongoDB is installed then loading JSON data is piece of cake.
Now to load JSON file in customer DB i would run one single line command.

At command prompt in bin directory:

>mongoimport –database orders –collection ordDetail < c:\orders.json

All done. File loaded. If db (orders in my case) doesn’t exist already then above mongoimport command will create database. Collection represents table in MongoDB and if
collection(ordDetail in my case) doesn’t exist already then it will be created too. MongoDB is flexible schema database system which works great for json, csv and tsv format files.

Once file is loaded then here are some basic MongoDB queries to check data.

Count how many documents loaded or total number of documents in ordDetail collection. This is equivalent to select count(*) from table in relational db’s.

>db.ordDetail.count()

To fetch one row\document from collection.

>db.ordDetail.findOne()

Sample output after masking data.

{
        “_id” : ObjectId(“511dc7e3476eee41aa7a073a”),
        “name” : “Burroug’s”,
        “OrderNumber” : “9187157999-3909592”,
        “Amount” : 48.4,
        “impression” : 0,
        “customField1” : “jhhgh10”,

}

To query everything: (Select * from table )

>db.ordDetail.find()

I would blog about MongoDB query commands in future posts.

Application owner wanted to move application to MongoDB now but project was put on hold as there are several other challenges comes into picture when working with NoSQL dbs.

Argument continues 🙂

To conclude here is my journey to try to load JSON in Sql Server. Quite a few other RDBMS already have functionality to load JSON data . I hope MS would add this feature in Sql Server soon as .Net framework already have several JSON related classes.

Thanks,
Neeraj

Advertisements
Categories: Uncategorized Tags: , ,

Table MetaData for Query Tuning – Quick and Dirty

August 7, 2012 Leave a comment

Important part of life as DBA is responding qucickly during any disaster or performance issues. As much it sucks it is really challenging and interesting at times . You have to be ready always with all the tools\scripts in arsenal to identify and troubleshoot problem. So this entry again related to tuning as has been some of my previous posts.

Several times I get scripts from developer with subjects “running slow. please check” . While looking into it I try to get as much information about some basic stats related to tables in query. This information includes index fragmentation , index stats info, indexes and indexed columns. If query is small and includes 1\2 tables then this tasks is easier as just select table and hit shortcut(I heavily use shortcuts) that presents all needed data. But if query is complex and involve several tables then it becomes tiresome to get this information for each and every table in same output.

I created below process to take care of that. Created SP that takes sql text as parameter which executes query text using sp_executesql ( tip- using set fmtonly on to discard any output). This would cache the plan. Finally I would query cached plan for recently executed sql text to get table names . Once I have table names I query metadata views to grab all the needed information.

Here is how it works.

Lets create sample tables and load some dummy data.

Create table Tableone(Id int identity(1,1) primary key ,item int ,dt datetime)
Create Table Tabletwo(Id int primary key ,price decimal(10,3) , dt1 datetime)

Below I am using Merge to load random data in both tables simultaneously.

Declare @i int , @d int
set nocount on
Set @i = 1

While @i <= 10000
Begin
Merge TableOne as target
Using (values (1)) as src (val)
on 1=2
when not matched
then insert (item , dt) values (abs(checksum(rand())),dateadd(second,abs(checksum(rand()))/10,’20060101′))
output inserted.id,cast(abs(checksum(rand()))/100000000.0 as decimal(10,3)) , inserted.dt
into Tabletwo;
set @i +=1
End

Create couple of indexes on both the tables to test our meta data query output.

create nonclustered index ncx_tableone on tableone(item ,dt)
create nonclustered index ncx_tabletwo_incl on tabletwo(dt1 desc ,price) include(id)

Finally Create below SP to execute sql text and query cache to fetch table related meta data.

CREATE Proc nsp_getTableMetaData @sql nvarchar(4000)
as 
Begin
Set nocount on

Declare @sql1 nvarchar(4000) = @sql

–Execute sql text to generate query plan

Exec sp_executesql @sql1

select 
qp.Query_Plan
into #t from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(plan_handle) qt
cross apply sys.dm_exec_query_plan(plan_handle) qp
where qt.text like ‘%’+@sql1 +’%’

–Store table names in temp table for meta data query.
;with xmlnamespaces (default ‘http://schemas.microsoft.com/sqlserver/2004/07/showplan&#8217;)

select distinct tbl.value(N’@Table’,N’sysname’) as [TableName] into #t2 from #t
cross apply #t.query_plan.nodes(‘//RelOp[./*/Object/@Database]’) as qp(obj)
cross apply qp.obj.nodes(‘./*/Object’) as obj(tbl)

Select x.[TableName] as TableName ,i.name as index_name,STATS_DATE(i.object_id, i.index_id) AS statistics_update_date
,f.avg_fragmentation_in_percent,stuff(IndexCols.Colname , 1,1,”) ,stuff(IndexIncCols.Colname , 1,1,”), f.page_count
from sys.indexes i
Inner Join (select [TableName] from #t2) as x on i.object_id = Object_id(x.TableName)
Cross apply dbo.my_index_physical_stats(DB_ID(),Object_id(x.TableName),i.index_id,null,NULL) as f 
CROSS APPLY (SELECT ‘, ‘ + cl.name +
CASE inc.is_descending_key
WHEN 1 THEN ‘ DESC’
ELSE ”
END
FROM sys.index_columns inc
JOIN sys.columns cl ON inc.object_id = cl.object_id
AND inc.column_id = cl.column_id
WHERE inc.object_id = i.object_id
AND inc.index_id = i.index_id
AND inc.is_included_column = 0
ORDER BY inc.key_ordinal
FOR XML PATH(”)) AS IndexCols(Colname)
OUTER APPLY (SELECT ‘, ‘ + cl.name
FROM sys.index_columns inc
JOIN sys.columns cl ON inc.object_id = cl.object_id
AND inc.column_id = cl.column_id
WHERE inc.object_id = i.object_id
AND inc.index_id = i.index_id
AND inc.is_included_column = 1
ORDER BY inc.index_column_id
FOR XML PATH(”)) AS IndexIncCols(Colname)
where i.type in (1,2)

End
go

I noticed that I can’t use sys.dm_db_index_physical_stats directly in query with cross apply etc. It was throwing some correlated subquery error. Binged
and found a way around in old post from Paul Randal(sqlskills.com) where Paul blogged about working around this by creating Table Values Function referring 
sys.dm_db_index_physical_stats. That solution was for 2005 version . For 2008 you just need to add one extra column in TVF. You can check above link to read more.

Here is copy of function :

CREATE FUNCTION my_index_physical_stats (
@database_id INT,
@object_id INT,
@index_id INT,
@partition_number INT,
@mode INT)
RETURNS @result TABLE (
database_id SMALLINT NULL, object_id INT NULL, index_id INT NULL, partition_number INT NULL,
index_type_desc NVARCHAR(60) NULL, alloc_unit_type_desc NVARCHAR (60) NULL, index_depth TINYINT NULL,
index_level TINYINT NULL, avg_fragmentation_in_percent FLOAT NULL, fragment_count BIGINT NULL,
avg_fragment_size_in_pages FLOAT NULL, page_count BIGINT NULL, avg_page_space_used_in_percent FLOAT NULL,
record_count BIGINT NULL, ghost_record_count BIGINT NULL, version_ghost_record_count BIGINT NULL,
min_record_size_in_bytes INT NULL, max_record_size_in_bytes INT NULL, avg_record_size_in_bytes FLOAT NULL,
forwarded_record_count BIGINT NULL,compressed_page_count BIGINT NULL)
BEGIN
INSERT INTO @result SELECT * FROM
sys.dm_db_index_physical_stats (@database_id, @object_id, @index_id, @partition_number, @mode)
RETURN
END;
GO

Finally you can test it by running below queries. Add set fmtonly option if you don’t want to see query output.

Exec nsp_getTableMetaData ‘ set fmtonly on;
declare @id int = 4 select t1.id , t1.item , t2.price , t2.dt1
from tableone t1 inner join tabletwo t2 on t1.id = t2.id
where t1.id = @id ; set fmtonly off;’

You can even run this with parameter values.

Exec nsp_getTableMetaData ‘declare @id int = 4 select t1.id , t1.item , t2.price , t2.dt1
from tableone t1 inner join tabletwo t2 on t1.id = t2.id
where t1.id = @id;’

Or

Exec nsp_getTableMetaData ‘select t1.id , t1.item , t2.price , t2.dt1
from tableone t1 inner join tabletwo t2 on t1.id = t2.id
where t1.id = 6 ;’

This has helped me a lot with my day to day query performance troubleshooting . It gets me basic troubleshooting information fast enough to fix it or dig deeper in query if nothing wrong with indexes or stats.

I would enhance it to include other statistics related to query like logical reads etc.

Check this out and let me know what do you think.

-NJ

Categories: Uncategorized Tags:

Avoid Trigger Execution – Special Cases

June 12, 2012 Leave a comment

Quick blog post on how to avoid trigger execution during some adhoc activity.

Today I needed to update around 12K rows in table which further had after update trigger. As a special case we didn’t want trigger to be executed after update. I knew I could do it by creating temp table in update  statement and then alter trigger to return if that temp table exists.. We tried that and it worked great.

But I still binge’d to see if there is any other way I could accomplish this . I saw same question in one forum and saw Context_Info() function mentioned as answer. But didn’t see any example code in the post so just for testing I ran couple of script to verify it and it works great too and also much cleaner way to get it done.

Scripts :

CREATE TABLE trtable (ID INT, gn VARCHAR(10))

GO

INSERT INTO trtable (ID, gn)

SELECT 1, ‘ab’
UNION ALL

SELECT 2, ‘bc’
UNION ALL

SELECT 3, ‘cd’

GO

–SELECT * FROM trtable

Trigger Script :

CREATE TRIGGER trigtest
ON trtable
AFTER UPDATE
AS
declare @c varbinary
set @c = CONTEXT_INFO()

If @c = 0x01
Begin
Return
End

EXEC msdb.dbo.sp_send_dbmail
@profile_name = ‘local’,
@recipients = ‘njandwani@local.com’,
@body = ‘Trigger executed.’,
@subject = ‘blah-blah-and-blah’;
GO

Now if you run below dml on trtable it won’t execute trigger code (In my trigger email part).

SET CONTEXT_INFO 0x01
INSERT INTO trtable(ID, Gn)

SELECT 4, ‘ef’

So here it is. Another way to skip trigger execution in some special cases. I didn’t get time to do further testing about behaviour in explicit transaction
or behaviour in other type of triggers. Will try to check it out if got time.

-NJ

Categories: Uncategorized

Script Indexes on all Tables using PowerShell.

September 19, 2011 1 comment

This is going to be quick post.. This morning I got requested to script only indexes on all tables in DB.

First I tried to search around in SSMS to see if there is any option to script only indexes on all tables but no luck. Finally after an hour of work with my little knowledge of PowerShell I was able to generate all the index script.. This is going to be part of my scriptbox for future usage.

In SSMS browse to tables and right click. Select start powershell . This will open powershell console window with path set to tables. Run below script to script indexes only.

Script :

$tbl = gci | select-object name
foreach($t in $tbl)
{
$path = “”
$path = “dbo.”+$t.name+”\indexes\”
gci -Path $path | %{$_.Script() | out-file c:\index.txt -append; “GO `r`n ” | out-file c:\index.txt -append; $_.Name;}
}

There might be several other better ways in powershell to do it but for now I could get this work using above script.. I would try to modify this script by adding options to Drop\Create indexes rather than create only in future version if get chance 🙂

NJ

Categories: Uncategorized

Recover Lost .RDL Files *

May 17, 2011 1 comment

 

Recently , per company wide policy my office machine was upgraded to Windows Server 2008 . Even after backing up everything I missed lot of my precious technical arsenal. That included my diagnostic scripts and number of .rdl files that I created and published for my DB admin work. For a moment I was distraught. Feeling of creating more than 50 reports again was painful but right at that moment I realized that I had not deleted my old ReportServer DB . Kind of blessing out of nowhere.

But again I wasn’t sure if ReportServer DB keep definition of every RDL and datasources. After 30 mins of table surfing I found out it does. Voila

I attached my old ReportServer DB and queried table dbo.Catalog  . Catalog table keeps definition of all the data sources and reports. Catalog table have column named “content” which is of image datatype .

You can run below query to get the definition of data sources and .rdl file from Catalog table. Value of column Type distinguish between the datasources and reports.

For datasources Type would be 5 and for reports Type would be 2 so for getting the definition of reports in Xml format we can run below query.

   1: SELECT

   2:      NAME,CAST(CAST(CONTENT AS VARBINARY(MAX)) AS XML) AS ReportDef

   3:   FROM dbo.catalog

   4:   WHERE TYPE = 2

 

For datasources :

   1: SELECT

   2:       NAME,CAST(CAST(CONTENT AS VARBINARY(MAX)) AS XML) AS ReportDef

   3:    FROM dbo.catalog

   4:    WHERE TYPE = 5

This would output definition in Xml format.

Rename the xml output as .rdl for reports and .rds for data sources. All set. Reports and datasources are ready again.

I had to make couple of other changes in .rds file which might be specific to my env. I was seeing some xml format error. So I created one dummy datasource and grabbed couple of xml format specific lines from dummy datasource .

That saved me hours of time that I would have spent creating reports and re-writing all the scripts.

Good luck until next tidbit.

–NJ

Categories: Uncategorized

My First Writing–Intro

May 11, 2011 Leave a comment

 

My Name is Neeraj Jandwani. I live in San Francisco Bay Area . Working as Senior Sql Server DBA .. Time just flies as I didn’t realize that it’s been 10 years I have been working with Sql Server. Yes, I know I am late comer in blogging world but It’s never too late. Here I will be blogging about my experiences with Sql Server and related technologies..I hope I don’t procrastinate and start blogging for real this time.

Ok..Will be back soon.

Neeraj

Categories: Uncategorized