![]() |
Home
Generate Test Data Quickly In Two Easy Steps
If you've paid for a testing tool, or have something like Visual Studio Premium, you can quickly generate data that's appropriate to the schema of your database. If not, and you want to do it quickly and for free, here's a simple two-step way to generate a sample set of highly customised test data, of say, 1,000 to 100,000 rows, to run through your small database or your excel/google workbook system. It obviously depends on your requirements, but a quick sample based on some customised functions could probably be generated in 15 to 30 minutes, using the following two step approach: 1. Create Dummy Data
2. Add Custom FieldsAfter opening the file in Excel, we now address the issue of Fields that were missing the type you wanted. Now we can get creative with Excel. For example, say we want to generate a currency code, but where most of the currencies are in GBP. I guess its possible to do this in Mockaroo, but I haven't worked out how, and besides, there may be other more complex functions that Mockaroo couldn't handle. Instead, we can use Excel to generate a function such as this:
If that's confusing to you, then lets back up and explain the recipe, because this is a useful formula.
First, the RAND() function generates a random number between 0 and 1. This can obviously be used to generate random percentages between 0% to 100% (because 0.10 = 10%, 0.20 = 20% etc). This will be used to generate the probabilities for our Currency Code being called. Second, the MATCH function matches the Random Number to a number in the list between curly braces. Note the MATCH function has 1 as the last argument. This last argument is for the Type, and 1 means it will perform a match on the next lowest value (normally the last argument is set to 0, which means perform an exact match). Hence, a random value of 0 to 0.89 would match the lowest figure of 0% (ie. the 1st item in the list). And random values of 0.90 and 0.91 would match 90% (which is the 2nd item in the list). If I were to replace RAND() with 0.58 (or 58%), this would match the first number on the lower side, or 0% in this case. Thus, the MATCH function would return 1. If I were to replace RAND() with 0.97, this would match the 0.95 which is the fourth-placed element of the array in which to match. Thus the MATCH function would return 4. Now, my test data required 90% to be denominated in GBP, 2% in USD, 3% in SIN, 3% in HKD, and 2% in EUR. In which case, my MATCH formula uses equivalent ranges of 0%-90%, 90%-92%, 92%-95%, 95%-98%, and 98%-100%. (Equivalent in that 90-0% = 90%, and 92-90% = 2%, etc).
We now move onto the CHOOSE function. This has a number as the first argument, then a comma-separated list of arguments afterwards. The number in first argument 'chooses' which of the subsequent list to return. Note of course that we had to have 5 numbers in the original MATCH lookup array, so that we can generate integers from 1 to 5 to exactly match the 5 currency codes. In other words, the numbers between the curly braces, {0, 0.90...0.98}, match the number of currency codes, "GBP","USD",...,"EUR". So, if we just look at the CHOOSE function:
And now, if we follow the complete formula all the way through:
Hence we have a function that generates Currency Codes based on a probability of appearance. What's LeftWell, you've now got either your raw test data from Mockaroo in some format, perhaps CSV or SQL, or your raw test data in Excel, where it can be saved in CSV format. And in Excel, you can even generate the SQL if you want to add some other functions at the end. For example, what I'm referring to here is a function that builds your SQL, such as something like:
If you want to generate further data from that 1,000 rows, you could of course repeat the process and generate more data from Mockaroo. Or, to use the above example, its possible that randomly generated currencies may be sufficient for your needs, particularly if you have calculated fields based on that currency field. Hence, each time you hit F9 (recalculate), a new set of random data will be generated. This way, you could potentially generate 1,000's more simple data sets. |