Generating Mock Data with Perl

Written by ethan.jarrell | Published 2018/06/21
Tech Story Tags: programming | mock-data-with-perl | data-with-perl | perl | perl-data

TLDRvia the TL;DR App

Recently at work, in order to test code, our team has created tests which mock data, or in some cases simply mock a positive return, which allows us to more easily isolate specific blocks of code we are working on. I’m sure that this is a fairly common practice in most Test Driven Design ( TDD ) scenarios. However, it got me thinking about an API I used to use when I began learning JavaScript which would generate random users. https://randomuser.me/

This is a great resource for mocking User Data. You might think of it as a Lorem Ipsum for randomly generated Users. As I was thinking about this neat tool, and how our team mocks other types of data, I began thinking about how I would go about making my own User Data Generator, similar to the above API.

If you want to see the source code for the project, here’s what I have so far:

https://github.com/ethanjarrell/randomUsers

The below examples will show how I thought about this process, as well as some code snippets of how I did it. First of all, I needed to know the types of things I would like to generate about each user. Here’s the list I came up with:

  • Name
  • Age
  • Gender
  • Ethnicity
  • Birthday
  • Email
  • Phone Number
  • Address
  • Occupation
  • Hobbies
  • Eye Color
  • Hair Color
  • Skin Color
  • Height
  • Weight
  • Body Type
  • Marital Status
  • Spouse ( if married )
  • Children
  • Highest Level of Education
  • School
  • Degree
  • Student Loan Debt
  • Annual Salary
  • Religion

In hindsight, I likely went overboard with this list. But it’s an interesting programming challenge. My thought process is that you could do this in one of two ways. First, you could create thousands of profiles, and when running the code, just return one out of the thousands at random.

The second option, which I thought would be far less time consuming would be to generate each new person at random…generating each trait with some degree of randomness, without relying on any database which contains completed random profiles. For example, each time you run the code, it would generate each trait from a list or by other means, and piece it all together to create a new user profile each time.

In doing this, I realized that some traits are somewhat dependent on other traits. While others can be completely independent of anything else. For example, you can’t choose Name and Gender at random. Otherwise you may end up with a Man named Isabella, or a Woman named Bucky. Not that there’s anything wrong with that, but for consistency’s sake, I wanted to stay somewhat within social norms.

Other subtle dependent traits would be things like hobbies, and weight/body type. For example, if we choose those traits at random, we may end up with a a person whose hobbies include triathlons, but whose body type is overweight. Again, nothing wrong here, but the point is to generate consistent data.

Another example might be education / job type and annual salary. Again, choosing at random, we could generate a person with a masters degree in electrical engineering, with an annual salary of $12,000 per year.

In order to tackle this in a way that makes sense, I started by first generating all the data that wouldn’t need to rely on one another. And then, depending on what is generated from my initial few traits, I can generate the remaining traits from a more isolated subset of data.

To start off my Perl file, I’m going to use a few native Perl packages:

#!/usr/bin/perluse DBI;use strict;use warnings;use lib qw(..);use JSON qw( );use Data::Dumper;

In this example, I haven’t actually utilized DBI, but if you wanted to write each newly generated User to a Database, DBI might come in handy. I will be using lib and JSON however.

To start off, I went online and found a list of randomly generated traits for specific entries. These are all included in the github link above. After cleaning the data, and removing spaces, and adding commas, I loop through the list, pushing each entry into an array. Then, when I want to make a random selection from the array, I can use the Perl “rand” method to get a random entry.

There are quite a few lists I’m using to pull data from, but I basically use the same format to pull the data from the list. Here’s an example of how I might create the list for both male and female names:

# declare a variable for the test txt filemy $maleNames = 'maleNames.txt';# connect to and open the json fileopen (FH, "< $maleNames") or die "Can't open $maleNames for read: $!";my @maleLines;while (<FH>) {push (@maleLines, $_);}close FH or die "Cannot close $maleNames: $!";

my @maleList = ();foreach my $x (@maleLines) {my $str = $x;@maleList = split(',', $str);}

# declare a variable for the test txt filemy $femaleNames = 'femaleNames.txt';# connect to and open the json fileopen (FH, "< $femaleNames") or die "Can't open $femaleNames for read: $!";my @femaleLines;while (<FH>) {push (@femaleLines, $_);}close FH or die "Cannot close $femaleNames: $!";

my @femaleList = ();foreach my $x (@femaleLines) {my $str = $x;@femaleList = split(',', $str);}

Now, I could simulate a coin toss as follows:

my $gender = int(rand(2));

This would generate a number for Gender of either 0 or 1.

Then I could use a conditional:

if ( $gender == 0 ) {$gender = "female";} else {$gender = "male";}

With gender decided at random, we can pull a new record at random from either the male or female name list, depending on the outcome of the randomly chosen gender.

Generating Random Phone Numbers:

Now, I could do an API call to determine Area codes, of a state, once a state is chosen at Random, but I’m not really getting to specific with this at this point. All I really care about is that the number is in a phone number format ( (xxx) xxx — xxxx ).

To do this, I just randomly generated 10 digits, and concatenated them into the correct format:

my $n1 = int(rand(8)+1);my $n2 = int(rand(9));my $n3 = int(rand(9));my $n4 = int(rand(9));my $n5 = int(rand(9));my $n6 = int(rand(9));my $n7 = int(rand(9));my $n8 = int(rand(9));my $n9 = int(rand(9));my $n10 = int(rand(9));

my $phoneNumber = "($n1$n2$n3) $n4$n5$n6-$n7$n8$n9$n10";

Generating Random ZipCodes:

Same thing with zipcodes:

my $z1 = int(rand(8)+1);my $z2 = int(rand(9));my $z3 = int(rand(9));my $z4 = int(rand(9));my $z5 = int(rand(9));

my $zipcode = $z1 . $z2 . $z3 . $z4 . $z5;

Generating Random Emails:

Emails I thought might be a little different. For example, if I’ve already generated a Random Name, I could simply do something like:

firstName.lastName @ emailServer . com

Although, that doesn’t seem very believable, so I wanted to make emails come in a few varieties.

my @emails = ("gmail", "hotmail", "protonmail", "yahoo", "aol");

my $index = rand @emails;my $email = $emails[$index];

my $index2 = rand @adjectiveList;my $adjective = $adjectiveList[$index2];

my $index3 = rand @animalsList;my $myAnimal = $animalsList[$index3];

Each time the code is run, an email service will be picked at random from a list of 5. Then an adjective and animal will be picked at random from a list of several hundred each. This would allow me to create emails formatted like:

fuzzy . kitten . 01 @ gmail .com

or

firstName . the . terrible @ yahoo .com

with those variables in place, I created 5 format types, and chose a number randomly between 1 and 5. Depending on the outcome of that, the new email gets that formatting. Here’s how that looks:

my $emailAddress = "";my $girlGuy ="";if ($gender eq "male"){$girlGuy = "guy";}else {$girlGuy = "girl";}my $emailFormat = int(rand(5));if ($emailFormat == 0){$emailAddress = $lastName[0] . "." . $lastName[1] . $sN3 . $z2 . "@" . $email . ".com";} elsif ($emailFormat == 1) {$emailAddress = $adjective . "_" . $myAnimal . $sN1 . "@" . $email . ".com";} elsif ($emailFormat == 2) {$emailAddress = $adjective . "_" . $lastName[0] . $sN1 . "@" . $email . ".com";} elsif ($emailFormat == 3) {$emailAddress = $newState . "_" . $girlGuy . $sN1 . $sN2 . $sN3 . "@" . $email . ".com";} elsif ($emailFormat == 4) {$emailAddress = $adjective . "_" . $newCity . "_" . $girlGuy . $sN1 . $sN2 . $sN3 . "@" . $email . ".com";}$emailAddress =~ s/\s+//g;

Generating a random job and degree:

Another fun one for me was generating a job and a degree. I used a list of random job titles, and select one at random, similar to the way I created and selected a male or female name at random. Then, with my new random job, I compare that job using the substr method in Perl. What I’m basically doing is comparing a substring of the job title against a huge list of potential degree programs. For example, if the job is:

Electrical Engineer

Then I might grab just the first 5 letters of the job title:

elect

And loop through all of the degree programs in that list, seeing if there’s a match. If there is a match, I’ll add the match to the list of degrees this new Random User has. I have lists of Bachelors, Associates and Masters programs, so I check them all, and push the matches into a degrees array:

my $substring = substr $job, 0, 5;

my @degrees = ();foreach my $x (@bachelorList) {if (index($x, $substring) != -1) {push @degrees, $x;}}

foreach my $x (@associateList) {if (index($x, $substring) != -1) {push @degrees, $x;}}

foreach my $x (@masterList) {if (index($x, $substring) != -1) {push @degrees, $x;}}

my $degree = "";

foreach my $x (@degrees) {$degree = $degree . ", " . $x;}

At the end, I simply concatenate all of the results into a string.

Without going through every decision I made, I eventually end up with a hash of results, which I print using the Dumper. My hash looks like this, with variables as the Hash values:

my $json = {NAME=>$name, WEIGHT=>$finalWeight, HEIGHT=>$finalHeight, EYE_COLOR=>$eyeColor, BODY_TYPE=>$bodyType, HAIR_COLOR=>$hairColor, SKIN_COLOR=>$skinColor, RELIGION=>$myReligion, SCHOOL=>$school, HIGHEST_LEVEL_OF_EDUCATION=>$highestLevelOfEducation, AGE=>$age, STUDENT_LOAN_DEBT=>'$' . $studentLoanDebt . '.00', ANNUAL_SALARY=>'$' . $annualSalary . '.00', CHILDREN=>$kids, BIRTHDAY=>$birthDate, GENDER=>$gender, MARITAL_STATUS=>$married, SPOUSE=>$spouse, ADDRESS=>$streetNumber . " " . $newStreet . ", " . $newCity . ", " . $newState . " " . $zipcode, PHONE=>$phoneNumber, EMAIL=>$emailAddress, OCCUPATION=>$job, HOBBIES=>$hobby1 . ", " . $hobby2 . ", " . $hobby3, DEGREE=>$degree, ETHNICITY=>$ethnicBackground};

print Dumper $json;

And my output usually looks like something along these lines:

{'ADDRESS' => '7841 3rd Street West, Duarte, Minnesota 87604','GENDER' => 'female','SPOUSE' => 'Raleigh Teer','STUDENT_LOAN_DEBT' => '$6336.00','HAIR_COLOR' => 'Dark Brown','HEIGHT' => '6 feet, 10 inches','EMAIL' => 'creepy_Duarte_girl784@aol.com','SKIN_COLOR' => 'Light Brown','RELIGION' => 'Stoicism','HOBBIES' => 'Golf, Tutoring Children, Photography','BIRTHDAY' => 'March 7, 1983','PHONE' => '(361) 734-5735','WEIGHT' => '327 lbs','AGE' => 35,'MARITAL_STATUS' => 'married','DEGREE' => ', Bachelor of Accountancy, Master of Accountancy, Master of Accounting and Information Systems','ETHNICITY' => 'Chinese Australian','SCHOOL' => 'University of Minnesota-Duluth','NAME' => 'Maggie Davila','HIGHEST_LEVEL_OF_EDUCATION' => 'Masters Degree','BODY_TYPE' => 'Thick','EYE_COLOR' => 'Black','OCCUPATION' => 'Accountant','CHILDREN' => 5,'ANNUAL_SALARY' => '$68077.00'};

This is a pretty fun project, and even if it isn’t extremely useful, I still learned a lot about Perl in doing it. Hope you enjoy.


Published by HackerNoon on 2018/06/21