深度学习数据更换背景_开始学习数据科学的最佳方法是了解其背景

深度学习数据更换背景

数据科学教育 (DATA SCIENCE EDUCATION)

目录 (Table of Contents)

  1. The Importance of Context Knowledge

    情境知识的重要性

  2. (Optional) Research Supporting Context-Based Learning

    (可选)研究支持基于上下文的学习

  3. The Context of Data Science

    数据科学的背景

  4. Understand the Concept, not the Calculation

    了解概念,而不是计算

  5. The Context of the Sub-Disciplines

    子学科的背景

  6. Next Steps

    下一步

情境知识的重要性 (The Importance of Context Knowledge)

I made the decision to orient my career path towards data science during my senior year of university. It only took one or two research-binges before I realized the vast depth of the field in front of me. I knew eventually I’d have to understand things like the architecture of a convolutional neural network, the process of numericalization for NLP, or the underpinnings of principal component analysis. However, rather than jumping into the minutiae of these concepts in a void, I’ve always needed to develop a rock-solid contextual foundation of knowledge first. I’ll call this approach context-based learning.

我决定在大学四年级时将自己的职业道路转向数据科学。 在我意识到眼前的广阔领域之前,只花了一两个研究便步。 我知道最终我将必须了解卷积神经网络的体系结构,NLP的数字化过程或主成分分析的基础。 但是,我始终没有首先跳入这些概念的细微之处,而是始终首先开发了坚实的知识上下文基础。 我将这种方法称为基于上下文的学习

什么是基于上下文的学习? (What is context-based learning?)

I will loosely define context-based learning as learning a concept by first focusing on its contextual elements. In other words, understanding the big picture before delving into the deep theory. It’s important to emphasize “first” in that definition, as learning the context is analogous to building the chassis of a vehicle. Although the chassis is an essential element, it is not a car, and is non-functional on its own. Rather, it is the bedrock from which the car is built. In the same way, a contextual framework is the bedrock from which technical content is laid on top of.

我将宽松地将基于上下文的学习定义为通过首先关注其上下文元素学习概念 。 换句话说,在深入研究深度理论之前先了解全局。 重要的是要在该定义中强调“第一”,因为学习上下文类似于构建车辆底盘。 尽管底盘是必不可少的元素,但它不是汽车,并且无法单独发挥作用。 相反,它是制造汽车的基石。 同样,上下文框架是基础,技术内容是基础。

(可选)研究支持基于上下文的学习 ((Optional) Research Supporting Context-Based Learning)

This style of learning leverages a fact well supported by research in the psychology of learning — humans retain knowledge most effectively by associating them to something they have a firm grasp on rather than memorizing new concepts in a void. In short, we learn by association.

这种学习方式充分利用了学习心理学方面的研究支持的事实-人类通过将知识与他们牢牢掌握的东西联系起来而不是在空虚中记住新概念,从而最有效地保留了知识。 简而言之, 我们通过联想学习。

The late educational psychology professor Dr. Barak Rosenshine at the University of Illinois emphasized the importance of these contextual frameworks in education in Principles of Instruction:

伊利诺伊大学的已故教育心理学教授Barak Rosenshine博士在《教学原理》中强调了这些情境框架在教育中的重要性

“When one’s knowledge on a particular topic is large and well-connected, it is easier to learn new information and prior knowledge is more readily available for use.”

“当一个人对某个特定主题的知识广博且联系紧密时,它就更容易学习新信息,并且现有知识也更易于使用。”

The amount of background knowledge you have is also correlated to how well you comprehend new material. Therefore, to learn most efficiently, one must develop a strong foundation of background knowledge prior to delving into the details.

您所拥有的背景知识的数量也与您对新材料的理解程度有关 。 因此,为了最有效地学习,在深入研究细节之前,必须先建立扎实的背景知识基础。

数据科学的背景 (The Context of Data Science)

So what is the background knowledge, or context, of data science? Well, I always begin context-based learning by asking a lot of questions. Specifically, I try to ask broad, conceptual questions, as opposed to detail-oriented ones.

那么,数据科学的背景知识或背景是什么? 好吧,我总是通过问很多问题来开始基于上下文的学习。 具体来说,我尝试提出广泛的概念性问题 ,而不是注重细节的问题。

The following is a handful of questions I first asked myself at the beginning of my data science journey, as well as the answers I provided. I want to emphasize that my answers fulfilled my context gaps of knowledge at the time. In the same way, you should answer these and other questions in a manner that relates to your educational and personal background directly.

以下是我在数据科学之旅开始时首先问自己的几个问题,以及我提供的答案。 我想强调的是,我的答案弥补了我当时在知识方面的空白。 同样,您应该以与您的教育和个人背景直接相关的方式回答这些问题和其他问题。

数据科学如何适应我对其他领域的理解? (How does data science fit into my understanding of other fields?)

Data science is an interdisciplinary field that leverages math, programming, business, and domain knowledge to tackle difficult data problems. The overlap between data science and my major (cognitive science with machine learning & neural computation) rests on math (which is necessary for machine learning), programming (which provides computational functionality for the field as a whole), as well as data analysis techniques, such as those used in computational neuroscience. The “science” in data science comes from its use of various scientific methodologies, such as statistical significance.

数据科学是一个跨学科领域,它利用数学,编程,业务和领域知识来解决棘手的数据问题。 数据科学与我的专业(具有机器学习和神经计算的认知科学)之间的重叠在于数学(机器学习必需的),编程(为整个领域提供计算功能)以及数据分析技术,例如计算神经科学中使用的那些。 数据科学中的“科学”来自对各种科学方法的使用,例如统计意义。

数据科学中最重要的元素是什么,它们如何相互联系? (What are the most important elements of data science, and how do they relate to one another?)

All data scientists go through a process known as the “data science pipeline”, essentially a step-by-step, end-to-end process outlining the workflow of a data scientist. Acronyms like OSEMN make the basic pipeline easy to remember, but generally, pipelines vary in their subtleties. The basic structure is as follows:

所有数据科学家都要经历一个称为“数据科学管道”的过程,该过程本质上是一个循序渐进的,端到端的过程,概述了数据科学家的工作流程。 OSEMN等首字母缩写词使基本管道易于记忆,但是通常,管道的细微之处有所不同。 基本结构如下:

  • Data Collection

    数据采集
  • Data Cleaning

    数据清理
  • Exploratory Data Analysis

    探索性数据分析
  • Model Building

    建筑模型
  • Visualization/ Model Deployment

    可视化/模型部署

什么是机器学习? 为何机器学习与数据科学如此紧密地联系在一起? (What is machine learning? And why is machine learning so tied to data science specifically?)

Machine learning (ML) is a field that studies computer science algorithms that are not traditional “closed” algorithms. Instead, ML algorithms “learn” from data. This reliance on data is what makes ML so integral to data science. ML is in the “model building” and “model deployment” category of the data science pipeline.

机器学习(ML)是研究不是传统的“封闭式”算法的计算机科学算法的领域。 相反,机器学习算法从数据中“学习”。 这种对数据的依赖使ML成为数据科学不可或缺的一部分。 ML属于数据科学管道的“模型构建”和“模型部署”类别。

数据科学的子学科是什么? (What are the sub-disciplines of data science?)

There are many fields that contribute to data science, but the most fundamental disciplines that make up data science are computer science, statistics, machine learning, and linear algebra. Although business and domain knowledge are also critical, the academic scope of data science relies on the original sub-disciplines mentioned. Furthermore, the sub-disciplines themselves often have their own sub-disciplines, such as calculus being necessary to understand how machine learning algorithms work.

数据科学有很多领域,但构成数据科学的最基本学科是计算机科学,统计学,机器学习和线性代数。 尽管业务和领域知识也很关键,但是数据科学的学术范围取决于所提到的原始子学科。 此外,子学科本身通常也具有自己的子学科,例如微积分对于理解机器学习算法的工作方式是必不可少的。

了解概念,而不是计算 (Understand the Concept, not the Calculation)

One important dichotomy I discovered early on during my undergrad math studies was the distinction between calculations and conceptual understanding. For example, in the case of statistics, memorizing how to calculate this

我在本科数学学习初期发现的一个重要二分法是计算概念理解之间的区别。 例如,对于统计数据,请记住如何计算

Image for post

is far less important than understanding the use case of a chi-square test statistic in testing hypotheses between categorical variables. Or, for calculus, understanding that this

在理解分类变量之间的假设时,远不如了解卡方检验统计量的用例重要。 或者,对于微积分,请理解

Image for post

describes an area underneath a quadratic curve is far more important than memorizing fancy methods to solve it by hand. (*ahem*)

描述二次曲线下方的区域远比记忆花哨的方法来手工解决它重要得多。 (*啊*)

I actually find building programs to be an incredibly accurate analogy of this. When learning to program, it is evidently clear early on that trying to learn every implementation of every function is impossible. A much more efficient strategy is to understand the inputs and outputs so that you may piece together snippets of code to make things work.

我实际上发现构建程序可以非常精确地类比。 在学习编程时,很显然很早就开始尝试学习每个功能的每个实现都是不可能的。 一种更有效的策略是理解输入和输出,以便您可以拼凑代码片段以使事情正常进行。

Image for post
Image by the author
图片由作者提供

Even in the cases you don’t google or use StackOverflow, courses like fastai abstract the vast majority of implementation away so that you may build an end-to-end framework of understanding first (in fastai’s case, build an end-to-end model), and only after do you go back to try to understand the fundamental details that underlie the abstractions.

即使在您不使用Google或不使用StackOverflow的情况下,诸如fastai之类的课程也将绝大多数实现抽象化了,以便您可以构建首先了解的端到端框架(在fastai的情况下,构建端到端模型),并做之后,才回去试着去了解背后的抽象的基本细节。

In this way, learning the concepts as opposed to the calculations is an application of context-based learning, as the contextual framework is built up so that when you do need to learn the calculations, they are compartmentalized properly.

通过这种方式,学习与计算相反的概念是基于上下文的学习的一种应用,因为构建了上下文框架,因此当您确实需要学习计算时,可以将它们适当地分隔开。

子学科的背景 (The Context of the Sub-Disciplines)

Following the context-based learning approach, once we have figured out the sub-disciplines of data science, we should dig into their context to understand how they fit in with the overall scope of the field.

遵循基于上下文的学习方法,一旦我们弄清了数据科学的子学科,就应该深入研究它们的上下文,以了解它们如何适合该领域的整体范围。

计算机科学 (Computer Science)

Why are all data science projects so coding-heavy?

为什么所有数据科学项目都如此繁重的编码?

Modern statistics dates back to the 19th century, yet the application of statistics was confined to small samples as there was no efficient means of organizing large amounts of data and calculating parameters. The computer was that means.

现代统计可以追溯到19世纪,但由于没有有效的方法来组织大量数据和计算参数,因此统计的应用仅限于小样本。 电脑就是那个意思。

Furthermore, the advent of GPU parallel processing enabled machine learning models to train hundreds of times faster. In essence, incredibly powerful tools for statistics became accessible via the computer, thus the heavy emphasis on coding.

此外,GPU并行处理的出现使机器学习模型的训练速度提高了数百倍。 从本质上讲,非常强大的统计工具可以通过计算机访问,因此非常重视编码。

FURTHER Qs: What programming languages are the most important for data science? How much programming do I need for data science?

问:哪些编程语言对数据科学最重要? 数据科学需要多少编程?

统计 (Statistics)

Why is statistics important for data science?

为什么统计对于数据科学很重要?

Given that most of data science is simply computational statistics, this field lays out the groundwork and toolset for rigorous mathematical analysis of data.

鉴于大多数数据科学仅仅是计算统计,因此该领域为严格的数据数学分析奠定了基础和工具集。

FURTHER Qs: Just what the hell is all this talk about Bayes? What specific statistics libraries do data scientists use?

问:问题 到底是关于贝叶斯的? 数据科学家使用哪些特定的统计库?

线性代数 (Linear algebra)

What is linear algebra and how does it relate to data science?

什么是线性代数,它与数据科学有什么关系?

Linear algebra is simply the study of linear equations. Multiple linear equations stacked together can be expressed as a matrix. Matrices, collections of numbers in rows and columns, are essentially equivalent to tabular data (data in a table). Moreover, image data is nothing but an n-dimensional vector of tuples (i.e. a list of a list of numbers). This is why a good understanding of linear algebra provides an understanding of the structure of data itself.

线性代数只是线性方程的研究。 堆叠在一起的多个线性方程式可以表示为矩阵。 矩阵,即行和列中的数字的集合,基本上等效于表格数据(表格中的数据)。 此外,图像数据不过是元组的n维向量(即,数字列表的列表)。 这就是为什么很好地理解线性代数可以理解数据本身的结构的原因。

FURTHER Qs: What is a tensor? How is linear algebra used in deep learning?

问:什么是张量? 线性代数如何在深度学习中使用?

机器学习与微积分 (Machine Learning & Calculus)

What is the link between calculus and machine learning?

微积分与机器学习之间的联系是什么?

A critical component of calculus is the study of optimization. Since the objective of all machine learning algorithms is to minimize an error function, calculus provides the tools to understand how that minimization occurs.

微积分的重要组成部分是优化研究。 由于所有机器学习算法的目标都是最小化误差函数,因此演算提供了了解最小化如何发生的工具。

FURTHER Qs: What is gradient descent? What is back-propagation? Why is calculus involved in it?

什么是梯度下降? 什么是反向传播? 为什么微积分参与其中?

下一步 (Next Steps)

Ask yourself conceptual questions. Lots of conceptual questions. These questions will vary for everyone as their aim should be to patch the gaps of knowledge for how data science fits into your overall understanding of the field.

问自己概念上的问题。 很多概念性问题。 这些问题对于每个人都会有所不同,因为他们的目标应该是弥补知识差距,以了解数据科学如何适合您对该领域的整体理解。

Get creative. A colleague of mine mentioned that visualization maps really helped her understand the context of AI, machine learning, and deep learning and how they all fit together. Similarly, use maps and flowcharts to understand any topics in data science you’re currently struggling to piece together.

发挥创意。 我的一位同事提到,可视化地图确实帮助她了解了AI,机器学习和深度学习的上下文以及它们如何融合在一起。 同样,使用地图和流程图了解您目前正在拼凑的数据科学中的任何主题。

Image for post
Image by the author
图片由作者提供

After you’re armed with a strong contextual understanding of data science, go ahead and dig deep into the nuances of various supervised algorithms, the best practices for data preprocessing, or the creation of beautiful dashboard visualizations with Tableau.

在对数据科学有很强的上下文理解能力之后,继续深入研究各种监督算法的细微差别,数据预处理的最佳实践或使用Tableau创建漂亮的仪表板可视化效果。

Just try to make sure every new concept is put into context along the way.

只是尝试确保在此过程中将每个新概念都放在上下文中。

翻译自: https://towardsdatascience.com/the-best-way-to-start-learning-data-science-is-to-understand-its-context-751e917e655e

深度学习数据更换背景

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389304.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

熊猫数据集_用熊猫掌握数据聚合

熊猫数据集Data aggregation is the process of gathering data and expressing it in a summary form. This typically corresponds to summary statistics for numerical and categorical variables in a data set. In this post we will discuss how to aggregate data usin…

IOS CALayer的属性和使用

一、CALayer的常用属性 1、propertyCGPoint position; 图层中心点的位置,类似与UIView的center;用来设置CALayer在父层中的位置;以父层的左上角为原点(0,0); 2、 property CGPoint anchorPoint…

QZEZ第一届“饭吉圆”杯程序设计竞赛

终于到了饭吉圆杯的开赛,这是EZ我参与的历史上第一场ACM赛制的题目然而没有罚时 不过题目很好,举办地也很成功,为法老点赞!!! 这次和翰爷,吴骏达 dalao,陈乐扬dalao组的队&#xff0…

谈谈数据分析 caoz_让我们谈谈开放数据…

谈谈数据分析 caozAccording to the International Open Data Charter(1), it defines open data as those digital data that are made available with the technical and legal characteristics necessary so that they can be freely used, reused and redistributed by any…

数据创造价值_展示数据并创造价值

数据创造价值To create the maximum value, urgency, and leverage in a data partnership, you must present the data available for sale or partnership in a clear and comprehensive way. Partnerships are based upon the concept that you are offering value for valu…

卷积神经网络——各种网络的简洁介绍和实现

各种网络模型:来源《动手学深度学习》 一,卷积神经网络(LeNet) LeNet分为卷积层块和全连接层块两个部分。下面我们分别介绍这两个模块。 卷积层块里的基本单位是卷积层后接最大池化层:卷积层用来识别图像里的空间模…

数据中台是下一代大数据_全栈数据科学:下一代数据科学家群体

数据中台是下一代大数据重点 (Top highlight)Data science has been an eye-catching field for many years now to young individuals having formal education with a bachelors, masters or Ph.D. in computer science, statistics, business analytics, engineering manage…

pwn学习之四

本来以为应该能出一两道ctf的pwn了,结果又被sctf打击了一波。 bufoverflow_a 做这题时libc和堆地址都泄露完成了,卡在了unsorted bin attack上,由于delete会清0变量导致无法写,一直没构造出unsorted bin attack,后面根…

北方工业大学gpa计算_北方大学联盟仓库的探索性分析

北方工业大学gpa计算This is my firts publication here and i will start simple.这是我的第一篇出版物,这里我将简单介绍 。 I want to make an exploratory data analysis of UFRN’s warehouse and answer some questions about the data using Python and Pow…

泰坦尼克数据集预测分析_探索性数据分析-泰坦尼克号数据集案例研究(第二部分)

泰坦尼克数据集预测分析Data is simply useless until you don’t know what it’s trying to tell you.除非您不知道数据在试图告诉您什么,否则数据将毫无用处。 With this quote we’ll continue on our quest to find the hidden secrets of the Titanic. ‘The …

关于我

我是谁? Who am I?这是个哲学问题。。 简单来说,我是Light,一个靠前端吃饭,又不想单单靠前端吃饭的Coder。 用以下几点稍微给自己打下标签: 工作了两三年,对,我是16年毕业的90后一直…

基于PyTorch搭建CNN实现视频动作分类任务代码详解

数据及具体讲解来源: 基于PyTorch搭建CNN实现视频动作分类任务 import torch import torch.nn as nn import torchvision.transforms as T import scipy.io from torch.utils.data import DataLoader,Dataset import os from PIL import Image from torch.autograd…

missforest_missforest最佳丢失数据插补算法

missforestMissing data often plagues real-world datasets, and hence there is tremendous value in imputing, or filling in, the missing values. Unfortunately, standard ‘lazy’ imputation methods like simply using the column median or average don’t work wel…

华硕猛禽1080ti_F-22猛禽动力回路的视频分析

华硕猛禽1080tiThe F-22 Raptor has vectored thrust. This means that the engines don’t just push towards the front of the aircraft. Instead, the thrust can be directed upward or downward (from the rear of the jet). With this vectored thrust, the Raptor can …

Memory-Associated Differential Learning论文及代码解读

Memory-Associated Differential Learning论文及代码解读 论文来源: 论文PDF: Memory-Associated Differential Learning论文 论文代码: Memory-Associated Differential Learning代码 论文解读: 1.Abstract Conventional…

大数据技术 学习之旅_如何开始您的数据科学之旅?

大数据技术 学习之旅Machine Learning seems to be fascinating to a lot of beginners but they often get lost into the pool of information available across different resources. This is true that we have a lot of different algorithms and steps to learn but star…

数据可视化工具_数据可视化

数据可视化工具Visualizations are a great way to show the story that data wants to tell. However, not all visualizations are built the same. My rule of thumb is stick to simple, easy to understand, and well labeled graphs. Line graphs, bar charts, and histo…

Android Studio调试时遇见Install Repository and sync project的问题

我们可以看到,报的错是“Failed to resolve: com.android.support:appcompat-v7:16.”,也就是我们在build.gradle中最后一段中的compile项内容。 AS自动生成的“com.android.support:appcompat-v7:16.”实际上是根据我们的最低版本16来选择16.x.x及以上编…

VGAE(Variational graph auto-encoders)论文及代码解读

一,论文来源 论文pdf Variational graph auto-encoders 论文代码 github代码 二,论文解读 理论部分参考: Variational Graph Auto-Encoders(VGAE)理论参考和源码解析 VGAE(Variational graph auto-en…

tableau大屏bi_Excel,Tableau,Power BI ...您应该使用什么?

tableau大屏biAfter publishing my previous article on data visualization with Power BI, I received quite a few questions about the abilities of Power BI as opposed to those of Tableau or Excel. Data, when used correctly, can turn into digital gold. So what …