Generating Databases for Tests or Other Purposes

Written by wbk#!$4 | Published 2020/10/03
Tech Story Tags: hacking | database | tips-and-tricks | python3 | testing | word-lists | privacy | data-science

TLDR An information security student created a database to expose marketing companies' dirty practices. Names, passwords (for greater credibility) and email addresses were added to the database. The author of this article has no surprising ending or one that will change your life, this is just another one of those stories that is always coolest in the narrator's mind. For more morally questionable content, you can find me on Linkedin, where I pass by just one more citizen like any other citizen like many other citizens.via the TL;DR App

Some time ago, a friend told me that she was having trouble testing a certain application. All test solutions stressed the platform correctly, generated relevant scalability results, but at the same time they looked very artificial, she said. Well, as a self-proclaimed information security student, also known as the weird-looking-guy-guy, I could adapt some of my studies to the case.

The Great Wall of Context

At that time, I was studying how some marketing companies buy hacked data on shady forums. In my youthful imagination, it would be interesting to expose these companies and their dirty practices. I could pass for a criminal selling what they were looking for. To trade, I would need a database. And to have a database, I would need ... well, data.

The Not-So-Big Scope Wall: What's the Plan?

What would marketing companies expect to find in a database? People. More precisely, ways to find them. That said, my idea was to create a simple database: Names, passwords (for greater credibility) and email addresses.

Weird Names and Where to Find Them

Fortunately, the internet is a wonderful place full of people with extravagant hobbies. Finding name banks is not difficult. That is, sites with endless lists with names in the most varied languages.
After browsing hundreds of sites, I reduced the search for ready-made lists, for people with similar purposes to mine. I found not one, but several lists in this repository on Github, and I thank the owner for having kindly organized all the credits. Thus, the content was ready for use.
Having the list of names, I just needed to search for passwords and, mainly, the most used email domains. Yes, only three data per individual. It was enough and the code had always allowed me to include more information if necessary.

We have the Loot, now what?

So, let's take all the ingredients and organize. At this stage, I had already cloned the repository, adding passwords and creating an environment that facilitated the work. All I had to do was generate data sets, trying to maintain as much fidelity as possible to a real database. I even passed the passwords through a hash algorithm (MD5, for no particular reason), even though banks with plain text passwords are more valuable. It is worth mentioning that I had no intention of actually selling a manipulated file, just receiving offers from companies and exposing them.

All Ready (?)

At this point, when I was ready to dig into the dark corners of the internet, where bad guys ate screws for breakfast (without any milk), I received the message from my friend quoted at the beginning of this article.
I realized that my little Frankenstein could be useful not only for my goals, but for those of honest and hardworking developers trying to make a living. She suggested that I share the story in this text, because perhaps this concept is useful for others with similar problems. And well, this is it. I have no surprising ending or one that will change your life, this is just another one of those stories that is always coolest in the narrator's mind.
Unless you belong to a company that buys leaked data. In this case, be careful or I may be your next contact on the supplier list.
For more morally questionable content, you can find me on Linkedin, where I pass by just one more citizen like any other.

Written by wbk#!$4 | Writer, editor, hunter of flying krakens and, sometimes, paranoid with security and privacy.
Published by HackerNoon on 2020/10/03