奇迹网站可视化排行榜]_外观可视化奇迹

奇迹网站可视化排行榜]

When reading a visualization is what we see really what we get?

阅读可视化内容时,我们真正看到的是什么?

This post summarizes and accompanies our paper “Surfacing Visualization Mirages” that was presented at CHI 2020 with a best paper honorable mention. This post was written collaboratively by Andrew McNutt, Gordon Kindlmann, and Michael Correll.

这篇文章总结并伴随了我们 2020年CHI上 发表的 论文“ 堆焊可视化奇迹 ”,并 获得了最佳论文荣誉奖。 这篇文章是由 Andrew McNutt Gordon Kindlmann Michael Correll 合作撰写的

TL; DR (TL;DR)

When reading a visualization, is what we see really what we get? There are a lot of ways that visualizations can mislead us, such that they appear to show us something interesting that disappears on closer inspection. Such visualization mirages can lead us to see patterns or draw conclusions that don’t exist in our data. We analyze these quarrelsome entities and provide a testing strategy for dispelling them.

阅读可视化内容时,我们所看到的就是我们真正得到的吗? 可视化有很多方式可以误导我们,从而使它们看上去向我们展示了一些有趣的东西,这些东西在仔细检查后就会消失。 这种可视化的幻影可以使我们看到数据中不存在的模式或得出结论。 我们分析了这些争吵的实体,并提供了消除它们的测试策略。

介绍 (Intro)

The trained data visualization eye notices red flags that indicate that something misleading is going on. Dual axes that don’t quite match up. Misleading color ramps. Dubious sources. While learning how visualizations mislead is every bit as important as learning how they are created, even the studious can be deceived!

训练有素的数据可视化眼睛会注意到红旗,表明发生了误导性事件。 不完全匹配的双轴。 误导的色带。 可疑来源。 在学习可视化如何产生误导时,与学习如何创建可视化一样重要,即使是好学也可以被欺骗!

These dastardly deceptions need not be deviously devised either. While some visualizations are of course created by bad actors, most are not. Even designs crafted with the best of intentions yield all kinds of confusions and mistakes. An uncareful or careless analyst might hallucinate meaning where there isn’t any or jump to a conclusion that is only hazily supported.

这些卑鄙的欺骗也不需要被巧妙地设计出来。 虽然某些可视化当然是由不良参与者创建的,但大多数可视化不是。 即使是精心设计的设计也会产生各种混乱和错误。 粗心或粗心的分析师可能会产生幻觉,这意味着没有答案或得出结论只是模糊地支持。

What can we say about the humble bar chart below on the left? It appears that location B has about 50% more sales than location A. Is the store in location A underperforming? Given the magnitude of the difference, I’d bet your knee jerk answer would be yes.

我们可以说一下左侧下方的条形图吗? 看来位置B的销售额比位置A多50%。位置A的商店表现不佳吗? 考虑到差异的严重性,我敢打赌你的膝盖混蛋的回答是肯定的。

Many patterns can hide behind aggregated data. For example, a simple average might hide dirty data, irregular population sizes, or a whole host of other problems. Simple aggregations like our humble bar chart are the foundation of many analytics tools, with subsequent analyses often being built on top of these potentially shaky grounds.

许多模式可以隐藏在聚合数据的后面 。 例如,简单的平均值可能会隐藏脏数据,不规则的人口规模或其他许多问题。 像我们简陋的条形图这样的简单聚合是许多分析工具的基础,随后的分析通常建立在这些可能不稳定的基础之上。

What are we to do about these problems? Should we stop analyzing data visually? Throw out our computers? Perhaps we can form a theory that will help us build a method for automatically surfacing and catching these quarrelsome errors?

这些问题我们该怎么办? 我们应该停止视觉分析数据吗? 扔掉我们的计算机吗? 也许我们可以形成一种理论,以帮助我们建立一种自动显示并捕获这些争端错误的方法?

A flow chart describing the visual analytics process.
The chart-making process is full of moments of agency for the chart creator. What counts as data? What is an appropriate way to manipulate that data? How do I show this data? How do I go about understanding it? The answers to all of these questions can affect the readers ultimate takeaways.
图表制作过程充满了图表创建者的代理商活动。 什么算作数据? 什么是处理该数据的合适方法? 如何显示此数据? 我如何去了解它? 所有这些问题的答案都会影响读者的终极收获。

输入幻影 (Enter Mirages)

On the road to making a chart or visualization there are many steps and stages, each of which are liable to let error in. Consider a simplified model: an analyst decides how to curate data, how to wrangle it into a usable form, how to visually encode that data, and then finally actually how to read it. When the analyst makes a decision, they exercise agency and create an opportunity for error, which can cascade along this pipeline, creating illusory insights.

在制作图表或可视化的过程中,有许多步骤和阶段,每个阶段都容易出错。考虑一个简化的模型:分析师决定如何整理数据,如何将数据整理成可用的形式,如何对数据进行视觉编码,然后最终实际读取数据。 当分析师做出决定时,他们会发挥代理作用,并创造出错的机会,而错误的机会会沿着这条流水线级联,从而产生虚幻的见解。

Something as innocuous as defining the bins of a histogram can mask underlying data quality issues, which might in turn lead to incorrect inferences about a trend. Arbitrary choices about axis ordering in a radar chart can cause a reader to falsely believe one job candidate is good while another is lacking. Decisions about what type of crime actually counts as a crime can lead to maps that drive radically different impressions about the role of crime in a particular area.

定义直方图的bin之类的无害操作可能掩盖了潜在的数据质量问题 ,从而可能导致对趋势的错误推断。 雷达图上有关轴排序的任意选择可能导致读者错误地认为一个求职者是好的,而另一个则缺乏。 关于实际上将什么类型的犯罪视为犯罪的决定可能会导致地图产生对特定区域中犯罪角色的根本不同印象。

Image for post
While charts tend to feel trust worthy, the harmless-seeming choices that create them can cause all sorts of hallucinations.
虽然图表倾向于值得信任,但创建图表的无害选择可能会引起各种幻觉。

The first step in addressing a problem is often to name it, so we introduce a term for these errors: Visualization Mirages. We define them as

解决问题的第一步通常是为其命名,因此我们为这些错误引入一个术语:可视化幻影。 我们将它们定义为

any visualization where the cursory reading of the visualization would appear to support a particular message arising from the data, but where a closer re-examination would remove or cast significant doubt on this support.

任何可视化,其中可视化显示的粗读似乎都支持来自数据的特定消息,但是更仔细的重新检查将消除这种支持或对该支持产生重大怀疑。

Mirages arise throughout visual analytics. They occur as the result of choices made about data. They come from design choices. They depend on what you are trying to do with the visualization. What may be misleading in the context of one task may not interfere with another. For instance, a poorly selected aspect ratio could produce a mirage for a viewer who wanted to know about the correlation in a scatterplot, but is unlikely to affect someone who just wants to find the biggest value.

视觉分析中出现了许多奇迹。 它们是由于对数据进行选择而产生的。 它们来自设计选择。 它们取决于您要如何处理可视化。 在一项任务中可能引起误解的内容可能不会干扰另一项任务。 例如,对于那些想了解散点图中的相关性,但不太可能影响只想找到最大价值的人,观看者选择的宽高比可能会产生幻影。

A man crawls across a desert following a a sign labeled “VA process” towards a mirage that is labeled “insights”
We all thirst for insight in visual analytics (or anywhere else). This desire can cause us to overlook important details or forget best practices.
我们都渴望在可视化分析(或其他任何方面)上获得见识。 这种渴望会导致我们忽略重要的细节或忘记最佳实践。

The errors that create mirages have both familiar and unfamiliar names: Drill-down Bias, Forgotten Population or Missing Dataset, Cherry Picking, Modifiable Areal Unit Problem, Non-sequitur Visualizations, and so many more. An annotated and expanded version of this list is included in the paper supplement. There is a sprawling universe of subtle and tricky ways that mirages can arise.

产生海市ages楼的错误既有熟悉的名称,又有不熟悉的名称: 向下钻取偏差 , 被遗忘的总体或缺失的数据集 ,Cherry采摘, 可修改的地域单位问题 , 非sequitur可视化等等。 此列表的带注释的扩展版本包含在论文补充中 。 幻影出现的范围是微妙而棘手的。

To make matters worse, there are few automated tools to help the reader or chart creator know that they haven’t deceived themselves in pursuit of insight.

更糟的是,几乎没有自动化工具可以帮助读者或图表制作者知道他们在追求洞察力方面并没有欺骗自己。

这些事情真的发生了吗? (Do these things really happen?)

Imagine you are curious about the trend of global energy usage over time. A natural way to address these questions would be to fire up Tableau and drop in the World Indicators dataset, which consists of vital world statistics from 2000 to 2012. The trend over time (a) shows that there was a sharp decrease in 2012! This would be great news for the environment, were it not illusory, as we see in (b) when checking the set of missing records.

想象一下,您对全球能源使用量随时间变化的趋势感到好奇。 解决这些问题的自然方法是启动Tableau并放下World Indicators数据集 ,该数据集包含2000年至2012年的重要世界统计数据。随着时间的推移(a),表明2012年急剧下降! 如果不是虚幻的话,这对于环境而言将是一个好消息,正如我们在(b)中检查缺失记录集时所看到的那样。

A line chart with the caption energy down? A bar chart with the caption Count of Nulls. A line chart with energy up?

If we try to quash these data problems by switching the aggregation in our line chart from SUM to MEAN, we find that the opposite is true!! There was a sharp increase in 2012. Unfortunately this conclusion is another mirage. The only non-null entries for 2012 are OECD countries. These countries have much higher energy usage than other countries across all years (d).

如果我们尝试通过将折线图中的汇总从SUM切换到MEAN来缓解这些数据问题,则会发现相反的事实!! 2012年急剧增加。不幸的是,这一结论是另一个幻象。 2012年唯一的非空条目是经合组织国家。 这些年来,这些国家的能源使用量比其他国家高得多(d)。

Two line charts. Left one shows Energy Usage vs Life Expectancy over time, the right one show energy use over time

Given these irregularities we can try removing 2012 from the data, and focus on the gradual upward trend in energy usage in the rest of the data. As we can see on the left, it appears that energy usage is tightly correlated with average life expectancy, perhaps more power means a happier life for everyone after all. Unfortunately this too is a mirage. The y-axis of this chart has been altered to make the trends appear similar, and obscures the fact that energy use is flat for most countries.

鉴于这些违规情况,我们可以尝试从数据中删除2012年,并关注其余数据中能源使用量的逐渐上升趋势。 正如我们在左侧看到的那样,能源使用似乎与平均预期寿命紧密相关,也许更高的功率毕竟意味着每个人的幸福生活。 不幸的是,这也是一个海市rage楼。 更改了此图表的y轴,以使趋势看起来相似,并且掩盖了大多数国家的能源使用量持平的事实。

Now of course, you’re probably saying:

当然,现在您可能会说:

但是我真的很聪明,我不会犯这种错误 (But I’m really smart, I wouldn’t make this type of mistake)

That’s great! Congrats on being smart. Unfortunately, even those with high data visualization literacy make mistakes. Visualizations are rhetorical devices that are easy to trust too deeply. Charting systems often give an air of credibility that they don’t necessarily warrant. It is often easier to trust your initial inferences and move on. Interactive visualizations with exploratory tools that help to might dispel a mirage are often only glanced at by casual readers. Sometimes you are just tired and miss something “obvious”.

那很棒! 恭喜你聪明。 不幸的是,即使那些具有较高数据可视化素养的人也会犯错。 可视化是易于深深信任的修辞手段 。 制图系统通常会给人一种不一定要保证的可信度。 相信最初的推论并继续前进通常会更容易。 具有探索性工具的交互式可视化工具有助于驱散海市rage楼,通常只有休闲读者才能看一眼 。 有时您只是累了而错过了一些“显而易见的”东西。

A chart showing the gun deaths in florida over time
This infamous chart appears on first glance to be saying that ‘Stand Your Ground’ decreased gun deaths, but on closer inspection it shows the opposite! Terrifying! (The author of this chart wasn’t actually trying to confuse anyone, they were just trying to explore a new design language)
这张臭名昭著的图表乍一看似乎是在说“站起来”减少了枪支死亡,但仔细检查却发现情况恰恰相反! 太恐怖了! (此图表的作者实际上并没有试图使任何人困惑,他们只是在尝试探索一种新的设计语言)

Some visualization problems are easy to detect, such as axes pointed in an un-intuitive or unconventional direction or a pie chart with more than a handful of wedges. This type of best practice knowledge isn’t always available, for instance, what if you are trying to use a novel type of visualization? (A xenographic perhaps?) There’d be nothing beyond your intuition to help guide you.

某些可视化问题很容易检测,例如指向非直觉或非常规方向的轴或带有多个楔形的饼图。 这种类型的最佳实践知识并不总是可用,例如,如果您尝试使用新颖的可视化类型怎么办? (也许是xenographic ?)除了您的直觉之外,没有什么可以帮助指导您。

Other, more terrifying, problems only arise for particular datasets when paired with particular charts. To address these we introduce a testing strategy (derived from Metamorphic Testing) that can identify some of this thorny class of errors, such as the aggregation masking unreliable inputs that we saw earlier with our humble bar chart.

其他更可怕的问题仅在与特定图表配对时才针对特定数据集出现。 为了解决这些问题,我们引入了一种测试策略(源自Metamorphic Testing ),该策略可以识别一些棘手的错误类别,例如聚合掩盖了我们之前在谦虚的条形图中看到的不可靠的输入。

Testing for errors is easy if you know the correct behavior of a system. Simply inspect the system and report your findings. In errors in the hinterlands of data and encoding we are left without such a compass. Instead, we try to find guidance by identifying symmetries across data changes.

如果您知道系统的正确行为,则测试错误很容易。 只需检查系统并报告您的发现。 在数据和编码腹地的错误中,我们没有指南针。 相反,我们尝试通过识别跨数据更改的对称性来找到指导。

The order in which you draw the dots in a scatterplot shouldn’t matter, right? Yet, depending on the dataset, it often can!!! This can erase data classes or cause false inferences. We test for this property by shuffling the order of the input data and then comparing the pixel-wise difference between the two images. If the difference is above a certain threshold we know that there may be a problem. This is the essence of our technique: for a particular dataset, execute a change that should have a predictable result (here no change), and compare the results.

在散点图中绘制点的顺序不重要,对吧? 但是,根据数据集,通常可以!!! 这可能会擦除数据类或导致错误的推断。 我们通过改组输入数据的顺序,然后比较两个图像之间的像素差异来测试此属性。 如果差异高于某个阈值,我们知道可能存在问题。 这是我们技术的本质:对于特定的数据集,执行应具有可预测结果(此处无变化)的更改,然后比较结果。

A series of 3 scatterplots. The first two show the same data but appear different. The third image highlights the differences
A simple scatterplot can hide the distributions it displays through draw order. This problem won’t affect every dataset, but here it hides the prevalence of the Americas in the middle of the distribution.
一个简单的散点图可以通过绘制顺序隐藏其显示的分布。 这个问题不会影响每个数据集,但是在这里它掩盖了美洲在分布中间的普遍性。

While it’s still in early development, we find that this approach can effectively catch a wide variety of visualization errors that fall in this intersection of matching encoding to data. These techniques can help surface errors in over-plotting, aggregation, missing aggregation, and a variety of other contexts. It remains an open challenge on how to effectively compute these errors (as their computation can be burdensome) as well as how to best describe these errors to the user.

尽管它仍处于早期开发阶段,但我们发现这种方法可以有效地捕获由于将编码与数据进行匹配而出现的各种可视化错误。 这些技术可以帮助在过度绘图,聚合,缺少聚合以及其他各种情况下出现表面错误。 如何有效地计算这些错误(因为它们的计算可能很麻烦)以及如何最好地向用户描述这些错误仍然是一个公开的挑战。

那在哪里离开我们? (Where does that leave us?)

Visualizations, and the people who create them, are prone to failure in subtle and difficult ways. We believe that visual analytics systems should do more to protect their users from themselves. One way these systems can do this is to surface visualization mirages to their users as part of the analytics process, which, hopefully will guide them towards safer and more effective analyses. Applying our metamorphic testing for visualization approach is just one tool in the visualization validation toolbox. The right interfaces to accomplish this goal is still unknown, although applying a metaphor of software linting seems promising. For more details check out our paper, take a look at the code repo for the project, or watch our CHI talk.

可视化及其创建人员很容易以微妙而困难的方式失败。 我们认为视觉分析系统应该做更多的事情来保护用户免受自身伤害。 这些系统可以做到这一点的一种方法是在分析过程中向用户展现可视化的幻影,这有望引导他们进行更安全,更有效的分析。 将我们的变质测试应用于可视化方法只是可视化验证工具箱中的一种工具。 尽管应用软件掉落的隐喻似乎 很有希望 ,但实现该目标的正确接口仍然未知。 有关更多详细信息,请查看我们的论文 ,查看该项目的代码存储库 ,或观看我们的CHI演讲 。

翻译自: https://medium.com/multiple-views-visualization-research-explained/surfacing-visualization-mirages-8d39e547e38c

奇迹网站可视化排行榜]

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391462.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

机器学习 量子_量子机器学习:神经网络学习

机器学习 量子My last articles tackled Bayes nets on quantum computers (read it here!), and k-means clustering, our first steps into the weird and wonderful world of quantum machine learning.我的最后一篇文章讨论了量子计算机上的贝叶斯网络( 在这里阅读&#xf…

BZOJ 1176: [Balkan2007]Mokia

一道CDQ分治的模板题,然而我De了一上午Bug...... 按时间分成左右两半,按x坐标排序然后把y坐标丢到树状数组里,扫一遍遇到左边的就add,遇到右边的query 几个弱智出了bug的点, 一是先分了左右两半再排序,保证的是这次的左…

深入理解InnoDB(1)—行的存储结构

1.InnoDB页的简介 页(Page)是 Innodb 存储引擎用于管理数据的最小磁盘单位。常见的页类型有数据页、Undo 页、系统页、事务数据页等 2.InnoDB行的存储格式 我们插入MySQL的记录在InnoDB中可能以4中行格式存储,分别是Compact、Redundant、D…

boltzmann_推荐系统系列第7部分:用于协同过滤的Boltzmann机器的3个变体

boltzmannRecSys系列 (RecSys Series) Update: This article is part of a series where I explore recommendation systems in academia and industry. Check out the full series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, and Part 7.更新: 本文是我探索…

深入理解InnoDB(2)—页的存储结构

1. 记录头信息 上一篇博客说到每行记录都会有记录头信息,用来记录每一行的一些属性 Compact行记录的记录头信息为例 1.1 delete_mask 这个属性标记着当前记录是否被删除,占用1个二进制位,值为0的时候代表记录并没有被删除,为1的…

爬虫神经网络_股市筛选和分析:在投资中使用网络爬虫,神经网络和回归分析...

爬虫神经网络与AI交易 (Trading with AI) Stock markets tend to react very quickly to a variety of factors such as news, earnings reports, etc. While it may be prudent to develop trading strategies based on fundamental data, the rapid changes in the stock mar…

深入理解InnoDB(3)—索引的存储结构

1. 索引的各种存储结构及其优缺点 1.1 二叉树 优点: 二叉树是一种比顺序结构更加高效地查找目标元素的结构,它可以从第一个父节点开始跟目标元素值比较,如果相等则返回当前节点,如果目标元素值小于当前节点,则移动到左…

深入理解InnoDB(4)—索引使用

1. 索引的代价 在了解索引的代价之前,需要再次回顾一下索引的数据结构B树 如上图,是一颗b树,关于b树的定义可以参见B树,这里只说一些重点,浅蓝色的块我们称之为一个磁盘块,可以看到每个磁盘块包含几个数据…

双城记s001_双城记! (使用数据讲故事)

双城记s001Keywords: Data science, Machine learning, Python, Web scraping, Foursquare关键字:数据科学,机器学习,Python,Web抓取,Foursquare https://br.pinterest.com/pin/92816442292506979/https://br.pintere…

web前端面试总结

2019独角兽企业重金招聘Python工程师标准>>> 摘要:前端的东西特别多,面试的时候我们如何从容应对,作为一个老兵,我在这里分享几点我的经验。 一、javascript 基础(es5) 1、原型:这里可以谈很多,…

tableau破解方法_使用Tableau浏览Netflix内容的简单方法

tableau破解方法Are you struggling to perform EDA with R and Python?? Here is an easy way to do exploratory data analysis using Tableau.您是否正在努力使用R和Python执行EDA? 这是使用Tableau进行探索性数据分析的简单方法。 Lets Dive in to know the …

六周第三次课

2019独角兽企业重金招聘Python工程师标准>>> 六周第三次课 9.6/9.7 awk awk也是流式编辑器,针对文档中的行来操作,一行一行地执行。 awk比sed更强大的功能是它支持了分段。 -F选项的作用是指定分隔符,如果不加-F选项,…

macaca web(4)

米西米西滴,吃过中午饭来一篇,话说,上回书说道macaca 测试web(3),参数驱动来搞,那么有小伙本又来给雷子来需求, 登录模块能不能给我给重新封装一下吗, 我说干嘛封装&…

rfm模型分析与客户细分_如何使用基于RFM的细分来确定最佳客户

rfm模型分析与客户细分With some free time at hand in the midst of COVID-19 pandemic, I decided to do pro bono consulting work. I was helping a few e-commerce companies with analyzing their customer data. A common theme I encountered during this work was tha…

数据仓库项目分析_数据分析项目:仓库库存

数据仓库项目分析The code for this project can be found at my GitHub.该项目的代码可以在我的GitHub上找到 。 介绍 (Introduction) The goal of this project was to analyse historic stock/inventory data to decide how much stock of each item a retailer should hol…

web前端效率提升之浏览器与本地文件的映射-遁地龙卷风

1.chrome浏览器,机制是拦截url,      1.在浏览器Element中调节的css样式可以直接同步到本地文件,反之亦然,浏览器会重新加载css,省去刷新   2.在source面板下对js的编辑可以同步到本地文件,反之亦然…

归因分析_归因分析:如何衡量影响? (第2部分,共2部分)

归因分析By Lisa Cohen, Ryan Bouchard, Jane Huang, Daniel Yehdego and Siddharth Kumar由 丽莎科恩 , 瑞安布沙尔 , 黄美珍 , 丹尼尔Yehdego 和 亚洲时报Siddharth库马尔 介绍 (Introduction) This is our second article in a series wh…

linux与磁盘相关的内容

本节所讲内容1.认识SAS-SATA-SSD-SCSI-IDE硬盘2.使用fdisk对磁盘进行操作,分区,格式化3.开机自动挂载分区4.使用parted操作大于等于4T硬盘5.扩展服务器swap内存空间 MBR(Master Boot Record)主引导记录,也就是现有的硬盘分区模式。MBR分区的标…

页面布局

页面布局两大类&#xff1a;   主站&#xff1a; 1 <div classpg-header> 2 <div stylewidth:980px;margin:0 auto;> 3 内容自动居中 4 </div> 5 <div classpg-content></div> 6 <div classpg-footer></div&…

sonar:默认的扫描规则

https://blog.csdn.net/liumiaocn/article/details/83550309 https://note.youdao.com/ynoteshare1/index.html?id3c1e6a08a21ada4dfe0123281637e299&typenote https://blog.csdn.net/liumiaocn/article/details/83550309 文本版&#xff1a; soanr规则java版 …