数据库 测试数据生成_我们的测试数据生成器如何使假数据看起来真实

数据库 测试数据生成

by Tom Winter

汤姆·温特(Tom Winter)

我们的测试数据生成器如何使假数据看起来真实 (How our test data generator makes fake data look real)

We recently released DataFairy, a free tool that generates test data. But first, let me tell you the story of how it came about.

我们最近发布了DataFairy ,这是一个免费的工具,可以生成测试数据。 但是首先,让我告诉您它是如何产生的。

This is the story of how we turned a fun open source side project into something that has turned out to be really useful.

这是关于我们如何将一个有趣的开源项目变成一个真正有用的故事。

This is not about fake news or tricking the masses. But the fact remains that for developers, software testers, and really anyone who has ever given a demo, fake data is essential and is surprisingly difficult to make up off the top of your head.

这与假新闻或欺骗群众无关。 但是事实仍然是,对于开发人员,软件测试人员以及曾经进行过演示的任何人来说,伪造数据都是必不可少的,而且令人惊讶地难以弥补。

Our story with fake data starts back when we first developed our SaaS tool, Devskiller. Like all applications, we needed users. We weren’t even looking for paying users at this point. We just needed candidate profiles for our application. What we needed was dummy data that looked real.

关于虚假数据的故事可以追溯到我们最初开发SaaS工具Devskiller时 。 像所有应用程序一样,我们需要用户。 目前,我们甚至都没有在寻找付费用户。 我们只需要用于我们的应用程序的候选配置文件。 我们需要的是看起来真实的伪数据。

我们需要一个测试数据生成器 (We needed a test data generator)

We needed fake data for a couple of reasons:

我们需要伪造数据的原因有两个:

1. We needed to see if our system worked

1.我们需要查看我们的系统是否正常工作

This meant that we needed to build a number of different dummy profiles to see if the system stored and displayed them correctly.

这意味着我们需要构建许多不同的虚拟概要文件,以查看系统是否正确存储和显示了它们。

2. We needed to sell our product

2.我们需要出售我们的产品

We needed to do demos for our first prospective customers. We wanted to show our customers what the system would look like after 6 months of inviting and testing hundreds of candidates.

我们需要为我们的第一个潜在客户进行演示。 我们想向我们的客户展示经过六个月的邀请和测试数百名候选人后,系统的外观。

Our first thought was to look for an available test data generator. But the problem is that data is hard to fake convincingly. Just ask this guy,

我们的第一个想法是寻找可用的测试数据生成器。 但是问题在于,很难令人信服地伪造数据。 只是问这个人,

or him,

还是他

很多数据都经过算法验证 (A lot of data is validated algorithmically)

If it was easy to make convincing data, we probably wouldn’t need a tool. But generating data can be tricky for a couple of reasons.

如果说服数据很容易,我们可能就不需要工具了。 但是由于以下几个原因,生成数据可能很棘手。

Fake data is more than just random numbers. Take the example of a credit card number. Most credit card numbers are based on something called a Luhn algorithm. To explain this we are going to use the example of a Visa card:

伪数据不仅仅是随机数。 以信用卡号为例。 大多数信用卡号都基于一种称为Luhn算法的东西。 为了说明这一点,我们将使用Visa卡的示例:

如何检查信用卡号码是否有效 (How to check if a credit card number is valid)

Before you start, it’s important to know that all Visa card numbers start with a 4. Also, they all have either 16 or 13 digits.

在开始之前,重要的是要知道所有Visa卡号都以4开头。此外,它们都具有16或13位数字。

Take this Visa card number:

使用此Visa卡号:

The first thing you need to do to see if you can validate the number is to double the alternating digits starting with the first digit in the sequence.

要查看是否可以验证数字,您需要做的第一件事是从序列中的第一个数字开始将交替的数字加倍。

4574487405351567
(4x2), (7x2), (4x2), (7x2), (0x2), (3x2), (1x2), (6x2)
8, 14, 8, 14, 0, 6, 2, 12

If the doubling that you’ve just done results in a number with two digits, add them together to get a single digit number.

如果您刚进行的加倍运算得到的数字是两位数,则将它们加在一起即可得到一位数字。

8, 5, 8, 5, 0, 6, 2, 3

You then need to go back to the original credit card number and replace the digits that you doubled the new value.

然后,您需要返回到原始信用卡号,并替换将新值翻倍的数字。

8554885405652537

This could either be the doubles value or the table of values with the digits added together. Now add it all up.

这可以是double值,也可以是数字加在一起的值表。 现在全部添加。

8+5+5+4+8+8+5+4+0+5+6+5+2+5+3+7=80

And then check to see if the sum is evenly divisible by 10. In this case it is, so the number is valid.

然后检查总和是否可以被10整除。在这种情况下,它是有效的,因此该数字有效。

You need some sort of computational algorithm to validate credit card numbers at scale. But credit card numbers are relatively easy pieces of data to get right. We didn’t just need individual pieces of verifiable data, we needed entire profiles.

您需要某种计算算法来大规模验证信用卡号。 但是信用卡号是相对容易获得的数据。 我们不仅需要单个可验证的数据,还需要整个配置文件。

可验证的配置文件需要逻辑上相互关联的各种数据 (Verifiable profiles need different kinds of data that relate to each other logically)

Credit card numbers are relatively easy to generate, because they only relate to themselves. But personal identity numbers often relate to other things about a person. Take the Swedish personal identity number, practically called the personnummer.

信用卡号相对容易生成,因为它们仅与自己相关。 但是个人身份号码通常与一个人的其他事情有关。 取瑞典的个人身份号码,实际上称为personnummer。

For those of you who don’t know, personnummers are designed for paying taxes, sort of like an American Social Security number. But they’re also used as a way to access services like healthcare and schools as well as non-governmental services like credit ratings.

对于不认识的人,personnummers是专为缴税而设计的,有点像美国社会保险号。 但是它们也被用作访问医疗保健和学校等服务以及信用评级等非政府服务的方式。

The format of a personnummer is slightly different than that of a credit card. It is a 10 digit number split into a six digit section and a four digit section connected by a hyphen.

personnummer的格式与信用卡的格式略有不同。 它是一个10位数字,分为一个六位部分和一个由连字符连接的四位部分。

Cool fact: Swedes over the age of 100 replace the hyphen in their personnummer with a plus sign.

很酷的事实:100岁以上的瑞典人用加号替换其personnummer中的连字符。

The first six digits in the personnummer are simple and correspond to the person’s birthday using a YYMMDD format. Of the second 4 digit section, the first three are a serial number. The third serial number digit is odd for males and even for females. The last number is a checksum digit.

personnummer中的前六位数字很简单,并且使用YYMMDD格式对应于该人的生日。 在第二个4位数部分中,前三个是序列号。 男性,甚至女性的第三个序列号数字都是奇数。 最后一个数字是校验和数字。

So if you take the personnummer:

因此,如果您使用personnummer:

601128–9235

You know that it is for a man born November 28th, 1960.

您知道这是给一个1960年11月28日出生的男人的。

60(year)11(month)28(day)-(under 100 years old)92(unique numbers)3(unique odd number for male)5(checksum digit)

To calculate the checksum, multiply the individual digits in the identity number with the corresponding digits in the number 212121–212.

要计算校验和,请将身份编号中的各个数字与编号212121-212中的相应数字相乘。

(6x2)(0x1)(1x2)(1x1)(2x2)(8x1)(9x2)(2x1)(3x2)
12, 0, 2, 1, 4, 8, 18, 2, 6

Just like with the Visa card above, if the product of any of these numbers results in a two digit number, simply add the two digits together.

就像上面的Visa卡一样,如果其中任何一个数字的乘积产生两位数的数字,只需将两位数字加在一起即可。

3, 0, 2, 1, 4, 8, 9, 2, 6

Add all the remaining products together.

将所有剩余的产品加在一起。

3+0+2+1+4+8+9+2+6=35

To get the checksum digit, subtract the last digit of the added products from 10 (the exception is that if the last digit is zero, the checksum is also zero).

要获得校验和数字,请从10中减去所添加乘积的最后一位(例外是,如果最后一位为零,则校验和也为零)。

10–5=5

So if you were going to generate a profile of this person, it couldn’t be of a woman born on April 10th, 1916. Her personnummer would have to be something like: 160410+1244. In other words, you couldn’t just come up with a random number and expect it to work with just any fake profile you’ve generated.

因此,如果您要生成此人的个人资料,则不可能是1916年4月10日出生的女人。她的personnummer必须为:160410 + 1244。 换句话说,您不能只想出一个随机数并期望它可以与您生成的任何伪造配置文件一起使用。

我们需要逻辑测试数据 (We needed logical test data)

The data would need to relate to each other in a logical way, since the personnummer isn’t the only piece of data that is built on outside information. Most types of identification numbers relate to other information in some way. We simply couldn’t find a test data generator which would do that, so we decided to build our own. It looks like we weren’t the only one having this problem.

数据将需要以逻辑方式相互关联,因为personnummer并不是唯一基于外部信息构建的数据。 大多数类型的标识号以某种方式与其他信息相关。 我们根本找不到能够做到这一点的测试数据生成器,因此我们决定构建自己的测试数据生成器。 看来我们并不是唯一一个遇到此问题的人。

妖精 (JFairy)

As regular contributors the open source community, we decided that the best way to generate the test data we needed was to build our own library. Called JFairy, our goal was for it to generate sets of data that were all verifiable and logically connected.

作为开放源代码社区的定期贡献者,我们认为生成所需测试数据的最佳方法是构建自己的库。 称为JFairy ,我们的目标是生成所有可验证的逻辑连接数据集。

This way we could populate our app with users. Our user data couldn’t be gibberish or else it couldn’t be imputed. So we put the library to work and it performed better than we could have expected. It even generates real people from time to time. We found this out because we used Gravatar to show the candidate pictures. We were surprised when a real photo appeared on our test account.

这样,我们可以向用户填充应用程序。 我们的用户数据不能乱码,否则不能被估算。 因此,我们将库投入使用,其性能超出了我们的预期。 它甚至不时产生真正的人。 我们发现这一点是因为我们使用Gravatar来显示候选图片。 当我们的测试帐户中出现真实照片时,我们感到惊讶。

This was really useful when we started shopping around our app. We wanted to show enterprise clients an account with 300 different test candidates on the platform. If we hadn’t built JFairy, we might have all tried to use the app a few times, but there were only five of us on the team. It would have been impractical for the five of us to come up with 300 logically connected fake profiles.

当我们开始在应用程序周围购物时,这真的很有用。 我们希望向企业客户显示一个平台上具有300个不同测试候选人的帐户。 如果我们没有构建JFairy,我们可能都曾几次尝试使用该应用程序,但团队中只有五个人。 对于我们五个人来说,想出300个逻辑连接的虚假配置文件是不切实际的。

The data generated by JFairy proved to be so convincing that new customers were puzzled as to where we had gotten all of these people to test. In fact, they asked us if we could help them with sourcing new developers, as clearly we were in touch with a number of people who have technical backgrounds, some of whom actually had validated skills.

事实证明,JFairy生成的数据令人信服,以至于新客户对于我们让所有这些人进行测试的地方感到困惑。 实际上,他们问我们是否可以帮助他们寻找新的开发人员,很明显,我们与许多具有技术背景的人保持联系,其中一些人实际上已经验证了技能。

我们需要让开源社区看看JFairy (We needed to let the open source community have a look at JFairy)

We realized that this was becoming something bigger than ourselves, so we decided to put the system out on open source. The first reason is that we are all avid users of open source code. We know that it’s important to give back to that community in order to get things in return. But on top of that, open source can bring real benefits back to the product. By putting our project out there so that a number of different developers can take a look at it, we can get some new ideas that we would never have considered.

我们意识到这正在变得比我们自己更大,因此我们决定将系统发布在开源上。 第一个原因是我们都是开放源代码的狂热用户。 我们知道,回馈社区以换取回报很重要。 但最重要的是,开源可以为产品带来真正的收益。 通过将我们的项目放到那里,以便许多不同的开发人员可以看一下它,我们可以获得一些我们从未考虑过的新想法。

The most notable contributions were the inclusion of new languages. We only built JFairy to generate data for English speakers and Polish speakers. After all, we are rather limited by the languages we know well. But of course, it could be a useful tool for people from any number of different countries. Through open source contributions, we’ve been able to add support for data in Spanish, French, German, Swedish, and Chinese.

最显着的贡献是加入了新的语言。 我们仅构建了JFairy来为英语使用者和波兰语使用者生成数据。 毕竟,我们受到我们熟知的语言的限制。 但是,当然,对于来自许多不同国家的人们来说,它可能是一个有用的工具。 通过开源贡献,我们已经能够添加对西班牙语,法语,德语,瑞典语和中文数据的支持。

We also realized that while we’re reaching a great group of users in software developers, Jfairy had applications well beyond a community whose members know how to code. So we decided to build on the success of the library and create an app which could support its use for more applications and more people.

我们还意识到,当我们接触到软件开发人员中的大量用户时,Jfairy所拥有的应用程序远远超出其成员知道如何编码的社区。 因此,我们决定在图书馆的成功基础上,创建一个可以支持更多应用程序和更多人员使用的应用程序。

数据童话让所有人都可以访问假数据 (Data Fairy gives everyone access to fake data)

JFairy proved to be super useful for developers who knew how to code, but they weren’t the only people out there who would use the data JFairy generated. Software testers need to be able to populate their systems to see if they work. Salespeople and marketers need data to make their demos look realistic. To make JFairy useful to the most people, we had to make its fake data easy to access.

JFairy被证明对知道如何编码的开发人员非常有用,但是并不是唯一使用JFairy生成的数据的人。 软件测试人员需要能够填充其系统以查看其是否正常运行。 销售人员和营销人员需要数据以使他们的演示看起来逼真。 为了使JFairy对大多数人有用,我们必须使其假数据易于访问。

With that goal in mind, we built DataFairy. DataFairy is an app powered by JFairy so you can access our fake data without having to learn to code first. The data is presented in a neat notebook interface. To get more than one fake profile, you can either generate a new profile or export a bulk list of up to 100 profiles to a CSV file. It is a free and easy way to populate your software with logically connected valid data.

考虑到这一目标,我们构建了DataFairy 。 DataFairy是由JFairy提供支持的应用程序,因此您无需先学习编码即可访问我们的虚假数据。 数据显示在简洁的笔记本界面中。 要获取多个伪造的配置文件,您可以生成一个新的配置文件,也可以将最多100个配置文件的批量列表导出到CSV文件。 这是一种使用逻辑连接的有效数据填充软件的免费简便方法。

我们对DataFairy未来的计划 (Our plans for DataFairy’s future)

DataFairy can always be improved upon and have new features added to it. In addition to our own efforts, we want to stick to the tenants of the open source community. We continue to solicit new languages that we can add to our roster and we have an open GitHub project. We would also love to eventually have users add sample data. This will help us build a community of participants who will help DataFairy grow and become more useful for more people.

DataFairy可以随时进行改进并添加新功能。 除了我们自己的努力,我们还希望坚持开源社区的租户。 我们继续征集可以添加到名册中的新语言,并且我们有一个开放的GitHub项目 。 我们也希望最终让用户添加样本数据。 这将帮助我们建立一个参与者社区,这将帮助DataFairy成长并变得对更多人有用。

Whether you need to download large batches of logically validated data or simply want to have fun reading the profiles that pop up, check out DataFairy.

无论您是需要下载大量经过逻辑验证的数据,还是只是想开心地阅读弹出的配置文件,请查看DataFairy 。

翻译自: https://www.freecodecamp.org/news/how-our-test-data-generator-makes-fake-data-look-real-ace01c5bde4a/

数据库 测试数据生成

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/393512.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

tp框架生命周期

1、入口文件 用户发起的请求都会经过应用的入口文件,通常是 public/index.php文件。当然,你也可以更改或者增加新的入口文件。 通常入口文件的代码都比较简单,一个普通的入口文件代码如下: // 应用入口文件 // 定义项目路径 d…

django 创建mysql失败_创建表时出现Django MySQL错误

我正在用MySQL数据库构建一个django应用程序。当我第一次运行“python manage.py migrate”时,一些表创建得很好,然后出现一些错误。出现的错误是:django.db.utils.IntegrityError: (1215, Cannot add foreign keyconstraint)当我运行这个MyS…

Laravel数据库迁移和填充(支持中文)

写在前面 经常我们做项目都团队协作开发,每个人都在自己本地的数据库,如果你曾经出现过让同事手动在数据库结构中添加字段的情况,数据库迁移可以解决你这个问题。 不仅如此,在线上部署的时候,也避免了手动导入数据库或…

leetcode374. 猜数字大小(二分法)

猜数字游戏的规则如下: 每轮游戏,系统都会从 1 到 n 随机选择一个数字。 请你猜选出的是哪个数字。 如果你猜错了,系统会告诉你这个数字比系统选出的数字是大了还是小了。 你可以通过调用一个预先定义好的接口 guess(int num) 来获取猜测结果…

什么情况下你的工作最为成功_如何在没有工作经验的情况下获得技术工作

什么情况下你的工作最为成功by Anthony Sistilli安东尼西斯蒂里(Anthony Sistilli) 如何在没有工作经验的情况下获得技术工作 (How to get a tech job with no previous work experience) I run a free community called the Forge where I help students navigate the world …

jquery批量删除

前台代码 <!doctype html> <html lang"en"> <head><meta charset"UTF-8"><meta name"viewport"content"widthdevice-width, user-scalableno, initial-scale1.0, maximum-scale1.0, minimum-scale1.0">…

MUI 里js动态添加数字输入框后,增加、减少按钮无效

https://www.cnblogs.com/ssjf/p/10193652.html numbox 的自动初化是在 mui.ready 时完成的mui 页面默认会自动初始化页面中的所有数字输入框&#xff0c;动态构造的 DOM 需要进行手动初始化。比如&#xff1a;您动态创建了一个 ID 为 abc 的数字输入框&#xff0c;需要 mui(#a…

Django——认证系统(Day72)

阅读目录 COOKIE 与 SESSION 用户认证 COOKIE 与 SESSION 概念 cookie不属于http协议范围&#xff0c;由于http协议无法保持状态&#xff0c;但实际情况&#xff0c;我们却又需要“保持状态”&#xff0c;因此cookie就是在这样一个场景下诞生。 cookie的工作原理是&#xff1a;…

description方法

1.description基本概念 NSLog("%", objectA);这会自动调用objectA的description方法来输出ObjectA的描述信息. description方法默认返回对象的描述信息(默认实现是返回类名和对象的内存地址) description方法是基类NSObject 所带的方法,因为其默认实现是返回类名和…

leetcode面试题 10.05. 稀疏数组搜索(二分法)

稀疏数组搜索。有个排好序的字符串数组&#xff0c;其中散布着一些空字符串&#xff0c;编写一种方法&#xff0c;找出给定字符串的位置。 示例1: 输入: words [“at”, “”, “”, “”, “ball”, “”, “”, “car”, “”, “”,“dad”, “”, “”], s “ta” 输出…

laravel框架制作缩略图和水印

1.首先需要使用 composer 在命令行安装最新版本的 intervention/image &#xff1a; composer require intervention/image2.注册服务提供者及别名&#xff08;Laravel 版本 ≤ 5.4&#xff09; 如果你的 laravel 版本小于或等于 5.4&#xff0c;安装后需要注册服务提供者和别…

mysql 模糊查询 tp框架_TP框架中模糊查询实现

TP框架中模糊查询实现$where[g.name] array(like,%.$groupname.%);表达式查询上面的查询条件仅仅是一个简单的相等判断&#xff0c;可以使用查询表达式支持更多的SQL查询语法&#xff0c;查询表达式的使用格式&#xff1a;$map[字段1] array(表达式,查询条件1);$map[字段2] ar…

肉体之爱的解释圣经_可以解释的AI简介,以及我们为什么需要它

肉体之爱的解释圣经by Patrick Ferris帕特里克费里斯(Patrick Ferris) 可以解释的AI简介&#xff0c;以及我们为什么需要它 (An introduction to explainable AI, and why we need it) Neural networks (and all of their subtypes) are increasingly being used to build pro…

Python可变与不可变类型及垃圾回收机制

1. 可变与不可变类型 1.1 可变类型 在id不变的情况下&#xff0c;value可以改变&#xff0c;则称之为可变类型。列表、字典与集合是可变的。 l1 [1,2,3,4,5] print(id(l1)) l1[1] 520 #改变列表元素 print(id(l1)) result&#xff1a; 1700748379208 …

12-1 12 防盗链 访问控制 php解析 代理

2019独角兽企业重金招聘Python工程师标准>>> 12.13 Nginx防盗链 12.14 Nginx访问控制 12.15 Nginx解析php相关配置 12.16 Nginx代理 扩展 502问题汇总 http://ask.apelearn.com/question/9109location优先级 http://blog.lishiming.net/?p10012.13 Nginx防盗链 用来…

leetcode911. 在线选举(二分法)

在选举中&#xff0c;第 i 张票是在时间为 times[i] 时投给 persons[i] 的。 现在&#xff0c;我们想要实现下面的查询函数&#xff1a; TopVotedCandidate.q(int t) 将返回在 t 时刻主导选举的候选人的编号。 在 t 时刻投出的选票也将被计入我们的查询之中。在平局的情况下&…

1-13句子逆序

题目描述 将一个英文语句以单词为单位逆序排放。例如“I am a boy”&#xff0c;逆序排放后为“boy a am I”所有单词之间用一个空格隔开&#xff0c;语句中除了英文字母外&#xff0c;不再包含其他字符 接口说明 /** * 反转句子 * * param sentence 原句子 * return 反转后的…

单例模式实现DB类

创建一个类 class DB_class {//public $name andy;//私有的属性private static $db;//公共的静态方法public static function index(){if (self::$db ! null ){return self::$db;}self::$dbnew DB_class();return self::$db;}//私有的构造方法private function __construct()…

终端定时任务 开始缓冲_如何开始使用终端以提高生产力

终端定时任务 开始缓冲by Luciano Strika通过卢西亚诺斯特里卡(Luciano Strika) 如何开始使用终端以提高生产力 (How to start using the terminal to be more productive) As developers, the terminal can be our second home.作为开发人员&#xff0c;码头可以成为我们的第…

图片预览------photoswipe 使用

photoswipe 使用 预览图片后&#xff0c;需要点击关闭按钮才能关闭&#xff0c;点击图片事件效果是放大图片&#xff0c;和微信的效果不一致&#xff0c;最后改用微信预览图片的接口了&#xff0c;但是例子可以用&#xff0c;记录一下&#xff01;&#xff01; http://www.cnbl…