Saturday, 10 January 2009

T-SQL Random Data Generator

I always think that creating a component to generate random data for a table in SQL Server should be easy but it never is. I’ve written a few windows components in the past to do this but I always leave them in random folders and lose them in my sock draw. So I’ve spend a bit of time writing a stored procedure that will do the job – so I can keep it associated with the database du jour and not lose it. The proc isn’t perfect but then neither am I.

What it does
  1. Will output an insert T-SQL statement for any named table in the database.
  2. Only tested on SQL Server 2005. The INFORMATION_SCHEMA use should make it forwardly compatible but doesn’t do backwards compatibly any favours.
  3. Detects foreign keys and inserts and makes a random but valid link to the child table (this was the hard part)
  4. Detect different data types and do the necessary SQL formatting
  5. Accounts for max character length in string type fields
  6. Accounts for identity columns
  7. Detects nullable columns and leaves them null

What it doesn’t do
  1. Doesn’t account for unique indexes
  2. There are omitted datatypes (cursor, timestamp, uniqueidentifier, binary, varbinary, image)

Horrible hacks
Where the foreign key links where linked to child table via a unique index rather than a primary key it assumes a foreign key of 1. The INFORMATION_SCHEMA views don’t include index information. So this would necessitate delving into the sys.index table. Not impossible but the proc was complicated enough as it was. I didn’t need it so I didn’t do it.

Assumptions
  1. Foreign keys are of a numeric type i.e. int, float et. al.
  2. Foreign key constraints exist on the database. It isn’t magic – it can’t work out the relationships if they don’t exist.


Create Procedure usp_RandomInsertGenerator
(
@TableName nvarchar(128)
)
As

Begin

Set NoCount On

Declare @ColumnName nvarchar(128)
Declare @FK_TableName nvarchar(128)
Declare @FK_ColumnName nvarchar(128)
Declare @ConstraintName nvarchar(128)
Declare @DataType nvarchar(128)
Declare @CharacterMaximumLength int
Declare @Sql nvarchar(max)
Declare @MaxValue int
Declare @InsertValue nvarchar(200)
Declare @SqlOutputFields nvarchar(max)
Declare @SqlOutputValues nvarchar(max)
Declare @FirstLoop bit
Declare @IsIdentity bit
Declare @StringInsert bit
Declare @DummyText varchar(48)

Set @FirstLoop = 1
Set @SqlOutputFields = 'INSERT INTO ' + @TableName + ' ('
Set @SqlOutputValues = ' VALUES ('
Set @DummyText = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'

-- cursor loops through every column for named table
Declare procCursor CURSOR FORWARD_ONLY
For
Select
COLUMN_NAME,
DATA_TYPE,
CHARACTER_MAXIMUM_LENGTH
From
INFORMATION_SCHEMA.COLUMNS col
Where
TABLE_NAME = @TableName
And IS_NULLABLE = 'NO'
Order By
ORDINAL_POSITION

Open procCursor
Fetch Next From procCursor Into @ColumnName, @DataType, @CharacterMaximumLength

while @@fetch_status <> -1
Begin

-- datatypes i haven't bothered implementing
if @DataType = 'cursor' OR @DataType = 'timestamp'
Or @DataType = 'uniqueidentifier' Or @DataType = 'binary'
Or @DataType = 'varbinary' Or @DataType = 'image'
Begin
Raiserror('Unsupported Data Type', 1, 16)
End

--reset variables
Set @FK_TableName = ''
Set @FK_ColumnName = ''
Set @StringInsert = 0
Set @ConstraintName = ''

-- Don't add in an insert value if the loop is an identity
Select @IsIdentity = COLUMNPROPERTY(OBJECT_ID(@TableName),@ColumnName,'IsIdentity')
if @IsIdentity = 1
Begin
Fetch Next From procCursor Into @ColumnName, @DataType, @CharacterMaximumLength
continue
End

-- getting the value to be inserted for this data type
if @DataType = 'varchar' Or @DataType = 'nvarchar'
Or @DataType = 'char' Or @DataType = 'nchar'
Or @DataType = 'text' Or @DataType = 'ntext'
Begin
Set @StringInsert = 1
Set @InsertValue = SubString(@DummyText, 1, @CharacterMaximumLength)
End
Else if @DataType = 'datetime' Or @DataType = 'smalldatetime'
Begin
Set @StringInsert = 1
Set @InsertValue = Cast(GetDate() as varchar(20))
End
Else -- it is some key of numeric type
Begin

-- getting the child table indexes
Set @Sql = '
Select
@FK_TableName = pkconst.TABLE_NAME,
@FK_ColumnName = pkconst.COLUMN_NAME,
@ConstraintName = coluse.CONSTRAINT_NAME
From
INFORMATION_SCHEMA.KEY_COLUMN_USAGE coluse
Join INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS fkconst
On fkconst.CONSTRAINT_NAME = coluse.CONSTRAINT_NAME
Left Join INFORMATION_SCHEMA.KEY_COLUMN_USAGE pkconst
On pkconst.CONSTRAINT_NAME = fkconst.UNIQUE_CONSTRAINT_NAME
Where
coluse.TABLE_NAME = @TableName
And coluse.COLUMN_NAME = @ColumnName'


Execute sp_executesql @Sql, N'@TableName nvarchar(128),
@ColumnName nvarchar(128),
@FK_TableName nvarchar(128) OUTPUT,
@FK_ColumnName nvarchar(128) OUTPUT,
@ConstraintName nvarchar(128) OUTPUT',
@TableName=@TableName,
@ColumnName=@ColumnName,
@FK_TableName=@FK_TableName OUTPUT,
@FK_ColumnName=@FK_ColumnName OUTPUT,
@ConstraintName=@ConstraintName OUTPUT

if Len(@FK_TableName) > 0 And Len(@FK_ColumnName) > 0
Begin

/* have found foreign key and the lookup table
so pick a random primary key from the lookup table */
Set @Sql = 'Select top 1 @InsertValue = Cast(' + @FK_ColumnName +
' as varchar(200)) From '
+ @FK_TableName + ' Order By newid()'
Execute sp_executesql @Sql, N'@InsertValue nvarchar(128) OUTPUT', @InsertValue=@InsertValue OUTPUT

End
Else if(Len(@ConstraintName) > 0)
Begin

/* OK we've found the foreign key constraint but have no idea what the
lookup table is. This is because we are joining on a unique index to the
lookup table not a primary key. Make a MASSIVE assumption in this instance
- that the lookup table has a link value of 1 */
Set @InsertValue = '1'

End
Else
Begin
-- no foreign key so the max that can be inserted is based on the datatype
-- don't bother with any thing greater than thirty thousand - big enough
if @DataType = 'bit'
Set @MaxValue = 1
else if @DataType = 'tinyint'
Set @MaxValue = 255
else if @DataType = 'smallint'
Set @MaxValue = 32767
else
Set @MaxValue = 32767

-- randomly generate a number to insert up to maximum
Set @InsertValue = Cast(ROUND(((@MaxValue - 1) * RAND() + 1), 0) as varchar(200))

End
End -- end of numeric processing

-- building up output string
Declare @Delimiter char(1)
if @FirstLoop = 1
Begin
Set @FirstLoop = 0
Set @Delimiter = ''
End
Else
Begin
Set @Delimiter = ','
End

Set @SqlOutputFields = @SqlOutputFields + @Delimiter + @ColumnName

if @StringInsert = 1
Begin
Set @SqlOutputValues = @SqlOutputValues + @Delimiter + '''' + @InsertValue + ''''
End
Else
Begin
Set @SqlOutputValues = @SqlOutputValues + @Delimiter + @InsertValue
End

Fetch Next From procCursor Into @ColumnName, @DataType, @CharacterMaximumLength
End -- finished this column = go to next

close procCursor
deallocate procCursor

-- outputting the sql string
Set @SqlOutputFields = @SqlOutputFields + ')'
Set @SqlOutputValues = @SqlOutputValues + ')'

select @SqlOutputFields + ' ' + @SqlOutputValues

End
Go


It’s very easy to create a wrapper sproc to get multiple statements.
One row of random data isn’t that much use – 10,000 might be.



Create Procedure usp_RandomInsertGeneratorWrapper
(
@TableName nvarchar(128),
@NumberOfInserts int
)
As
Begin

Set NoCount On

Declare @StatementCount int
Set @StatementCount = 0

While @StatementCount < @NumberOfInserts
Begin

exec usp_RandomInsertGenerator @TableName
Set @StatementCount = @StatementCount + 1
End

End

Room for improvement
Here’s some extensions that my be useful for individual projects
  1. Implement missing datatypes. Things like uniqueidenifier would be very easy indeed to implement. It’s just that I didn’t need it.
  2. Create custom inserts for know field names to give more realistic data for your application
  3. Randomise the character data properly.
So that’s it. Live it, love it and fiddle about with it a bit.

No comments: