Home‎ > ‎

Generate Test Data Quickly In Two Easy Steps

posted May 7, 2015, 12:19 PM by Chris Adams   [ updated May 17, 2015, 11:54 AM ]

If you've paid for a testing tool, or have something like Visual Studio Premium, you can quickly generate data that's appropriate to the schema of your database.

If not, and you want to do it quickly and for free, here's a simple two-step way to generate a sample set of highly customised test data, of say, 1,000 to 100,000 rows, to run through your small database or your excel/google workbook system. It obviously depends on your requirements, but a quick sample based on some customised functions could probably be generated in 15 to 30 minutes, using the following two step approach:

1. Create Dummy Data

  • First, obviously decide what columns, or fields, you require. For each column decide on what kind of data generation plan you need. For example, a domain of addresses, or perhaps a list generated by weighted probability. In the test harness, perhaps the column needs to be generated by a calculation.

  • Use an online data generator, such as Mockaroo. Mockaroo is quite good and provides the ability to generate such data types and fields as Booleans, Functions, URLs, Normal Distributions, Geo Coordinates, and JSON data arrays, as well as typical fields such as Names, Addresses, and Emails. It also allows you to generate blanks. If you can't find a field type that matches your domain or needs, don't worry, we can most likely fix that in a spreadsheet, and I'll show you a solution to that shortly. For now, you can just type in the field name and choose Blank.

  • Choose the number of rows, generate the data, and then save it in your preferred format. If the above is sufficient for dummy data for your database, you could choose the SQL option, or even the CSV option and then import it straight into your database table/s. For the purposes of this, we'll sidestep that step and save 1,000 rows of data as 'Excel' so that we can add some more fancy test data.


2. Add Custom Fields

After opening the file in Excel, we now address the issue of Fields that were missing the type you wanted. Now we can get creative with Excel. For example, say we want to generate a currency code, but where most of the currencies are in GBP. I guess its possible to do this in Mockaroo, but I haven't worked out how, and besides, there may be other more complex functions that Mockaroo couldn't handle.

Instead, we can use Excel to generate a function such as this:


If that's confusing to you, then lets back up and explain the recipe, because this is a useful formula.

The MATCH & RAND Component


First, the RAND() function generates a random number between 0 and 1. This can obviously be used to generate random percentages between 0% to 100% (because 0.10 = 10%, 0.20 = 20% etc). This will be used to generate the probabilities for our Currency Code being called.

Second, the MATCH function matches the Random Number to a number in the list between curly braces. Note the MATCH function has 1 as the last argument. This last argument is for the Type, and 1 means it will perform a match on the next lowest value (normally the last argument is set to 0, which means perform an exact match).

Hence, a random value of 0 to 0.89 would match the lowest figure of 0% (ie. the 1st item in the list). And random values of 0.90 and 0.91 would match 90% (which is the 2nd item in the list).

If I were to replace RAND() with 0.58 (or 58%), this would match the first number on the lower side, or 0% in this case. Thus, the MATCH function would return 1.

If I were to replace RAND() with 0.97, this would match the 0.95 which is the fourth-placed element of the array in which to match. Thus the MATCH function would return 4.

Now, my test data required 90% to be denominated in GBP, 2% in USD, 3% in SIN, 3% in HKD, and 2% in EUR. In which case, my MATCH formula uses equivalent ranges of 0%-90%, 90%-92%, 92%-95%, 95%-98%, and 98%-100%.

(Equivalent in that 90-0% = 90%, and 92-90% = 2%, etc).

The CHOOSE Component

=CHOOSE(rand_integer_1_to_5, "GBP","USD","SIN","HKD","EUR")

We now move onto the CHOOSE function.

This has a number as the first argument, then a comma-separated list of arguments afterwards. The number in first argument 'chooses' which of the subsequent list to return.

Note of course that we had to have 5 numbers in the original MATCH lookup array, so that we can generate integers from 1 to 5 to exactly match the 5 currency codes. In other words, the numbers between the curly braces, {0, 0.90...0.98}, match the number of currency codes, "GBP","USD",...,"EUR".

So, if we just look at the CHOOSE function:

  • CHOOSE(1, "GBP","USD","SIN","HKD","EUR") would result in a value of GBP
  • CHOOSE(4, "GBP","USD","SIN","HKD","EUR") would result in a value of HKD

And now, if we follow the complete formula all the way through:

  • a RAND() of 0.58 would result in a MATCH of 1, which would result in a CHOOSE of GBP.
  • a RAND() of 0.97 would result in a MATCH of 4, which would result in a CHOOSE of HKD.

Hence we have a function that generates Currency Codes based on a probability of appearance.

What's Left

Well, you've now got either your raw test data from Mockaroo in some format, perhaps CSV or SQL, or your raw test data in Excel, where it can be saved in CSV format. And in Excel, you can even generate the SQL if you want to add some other functions at the end. For example, what I'm referring to here is a function that builds your SQL, such as something like:

="INSERT INTO tableName VALUES (" & A2 & "," & B2 & ","... & ")".

Of course, you could also add other fields. For instance, again using the above example, we could build calculations based on the randomly generated currency fields. Once you've made sure you've got the calculations right, you then have a customised set of test data with 1,000 rows of data.

If you want to generate further data from that 1,000 rows, you could of course repeat the process and generate more data from Mockaroo. Or, to use the above example, its possible that randomly generated currencies may be sufficient for your needs, particularly if you have calculated fields based on that currency field. Hence, each time you hit F9 (recalculate), a new set of random data will be generated. This way, you could potentially generate 1,000's more simple data sets.