数据科学与大数据是什么意思_什么是数据科学?

数据科学与大数据是什么意思

Data Science is an interdisciplinary field that uses a combination of code, statistical analysis, and algorithms to gain insights from structured and unstructured data.

数据科学是一个跨学科领域,它结合使用代码,统计分析和算法来从结构化和非结构化数据中获取见解。

Let’s break this down.

让我们分解一下。

We’re all kind of familiar with data. It’s stored information. Anything we read online is data. Anything we do that is recorded can be a data point. So a “data scientist” is someone who works with data and uses a structured approach to find insight from a set of data. They do this in any number of fields, from healthcare, to marketing, to medical sciences. The focus of a data scientist is on mathematical models — statistics and algorithms. An algorithm can be defined as “a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.” You can think about an algorithm as a set of steps to follow in order to solve a problem, like a Rubik’s cube. If you think back to high school algebra, you might remember the formula for a line on a graph:

我们都非常熟悉数据。 它存储了信息。 我们在线阅读的都是数据。 我们所做的任何记录都会成为数据点。 因此,“数据科学家”是从事数据工作并使用结构化方法从一组数据中寻找见解的人。 他们在医疗,营销,医学等许多领域都做到这一点。 数据科学家的重点是数学模型-统计和算法。 可以将算法定义为“在计算或其他问题解决操作(尤其是计算机)中要遵循的过程或一组规则”。 您可以将算法视为解决问题的一组步骤,例如魔方。 如果回想起高中代数,您可能还记得图中的一条线的公式:

y = mx + b

y = mx + b

You can determine the slope of a line based on data points and this basic algebraic equation. If you start with two data points, you can predict what a “y” value would be, given an “x” value.

您可以根据数据点和此基本代数方程确定直线的斜率。 如果从两个数据点开始,则可以在给定“ x”值的情况下预测“ y”值。

From this we can use the equation to extrapolate an equation.

由此,我们可以使用方程式外推方程式。

Image for post

Which will indicate that if we have an “x” value of 1, the algorithm provides a “y” value of 2.1.

这将表明如果我们的“ x”值为1,则算法提供的“ y”值为2.1。

This is basically the kind of problem that a data scientist tries to solve, but with things like what will make a customer purchase a product and how a stock portfolio will perform over time, which are much more complicated and involve way more factors than a simple algebra. They use code and other technologies to build these models, and are constantly working to improve their predictions. They are working for companies like Spotify, Yelp, and Google.

基本上,这是数据科学家试图解决的问题,但是诸如使客户购买产品的原因以及随着时间的推移股票投资组合的绩效之类的事情要复杂得多,涉及的因素要比简单的多。代数 他们使用代码和其他技术来构建这些模型,并一直在努力改善他们的预测。 他们为Spotify,Yelp和Google等公司工作。

The thing about Data Science, though, is that it is a new field that is still getting defined. While every company seems to want a Senior Data Scientist, the job descriptions can vary incredibly. It’s also a weird field where some companies want a super experienced person with a PhD and others are excited to employ someone at an entry level, someone who may have completed a Boot Camp. One thing I like about this field, is that if you study Data Science, you learn a bunch of skills that can be used in other, similar, roles. For example, a Data Analyst might need to know about statistics, data cleaning, Big Data, and APIs. A Data Engineer should understand the same things, and what a Data Scientist needs to do in order to support them, as well as be able to code efficiently in multiple languages (I use Python and SQL), understand Amazon Web Services, or another Cloud based platform, and other basic data related things.

但是,关于数据科学的问题是,这是一个仍在定义中的新领域。 尽管每个公司似乎都希望有一位高级数据科学家,但职位描述却千差万别。 这也是一个很奇怪的领域,有些公司希望拥有一名经验丰富的博士学位的人,而另一些公司则兴奋地聘请了入门级的人,这些人可能已经完成了新手训练营。 我喜欢这个领域的一件事是,如果您学习数据科学,就会学到很多可以在其他类似角色中使用的技能。 例如,数据分析师可能需要了解统计信息,数据清理,大数据和API。 数据工程师应该理解相同的事物,以及数据科学家需要做什么才能支持它们,以及能够以多种语言(我使用Python和SQL)进行高效编码,了解Amazon Web Services或其他云基础平台和其他与基础数据相关的事物。

Needless to say, there are a lot of opportunities and directions you can go in if you choose to learn Data Science. As a person working in data, you have the ability to provide insight to complex information about customers, you can help define how ethical your companies analytics or machine learning models are, you hold a lot of unique and interesting power. You are required to constantly be learning new things, solving new problems and troubleshooting odd inconsistencies.

不用说,如果您选择学习数据科学,可以找到很多机会和方向。 作为数据工作人员,您可以洞悉有关客户的复杂信息,可以帮助定义公司分析或机器学习模型的道德标准,并拥有许多独特而有趣的功能。 您需要不断学习新事物,解决新问题并解决奇怪的不一致问题。

If this is something you are interested in learning more about, you can check out TechCultivator on LinkedIn and Instagram. They are a company dedicated to helping underrepresented folks get rewarding data science and software development jobs through skill building, mentorship, networking and community.

如果您有兴趣了解更多信息,可以在LinkedIn和Instagram上查看TechCultivator。 他们是一家致力于通过技能建设,指导,网络和社区帮助代表性不足的人们获得有价值的数据科学和软件开发工作的公司。

Image for post

翻译自: https://medium.com/@edithiyerhernandez/what-is-data-science-678feaa8a282

数据科学与大数据是什么意思

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388160.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

C#制作、打包、签名、发布Activex全过程

一、前言 最近有这样一个需求,需要在网页上面启动客户端的软件,软件之间的通信、调用,单单依靠HTML是无法实现了,因此必须借用Activex来实现。由于本人主要擅长C#,自然本文给出了用C#实现的范例,本文的预期…

用Python创建漂亮的交互式可视化效果

Plotly is an interactive Python library that provides a wide range of visualisations accessible through a simple interface.Plotly是一个交互式Python库,通过简单的界面即可提供广泛的可视化效果。 There are many different visualisation libraries avai…

Hadoop 2.0集群配置详细教程

Hadoop 2.0集群配置详细教程 前言 Hadoop2.0介绍 Hadoop是 apache 的开源 项目,开发的主要目的是为了构建可靠,可拓展 scalable ,分布式的系 统, hadoop 是一系列的子工程的 总和,其中包含 1. hadoop common &#xff…

php如何减缓gc_管理信息传播-使用数据科学减缓错误信息的传播

php如何减缓gcWith more people now than ever relying on social media to stay updated on current events, there is an ethical responsibility for hosting companies to defend against false information. Disinformation, which is a type of misinformation that is i…

[UE4]删除UI:Remove from Parent

同时要将保存UI的变量清空,以释放占用的系统内存 转载于:https://www.cnblogs.com/timy/p/9842206.html

BZOJ2503: 相框

Description P大的基础电路实验课是一个无聊至极的课。每次实验,T君总是提前完成,管理员却不让T君离开,T君只能干坐在那儿无所事事。先说说这个实验课,无非就是把几根导线和某些元器件(电阻、电容、电感等)…

泰坦尼克号 数据分析_第1部分:泰坦尼克号-数据分析基础

泰坦尼克号 数据分析My goal was to get a better understanding of how to work with tabular data so I challenged myself and started with the Titanic -project. I think this was an excellent way to learn the basics of data analysis with python.我的目标是更好地了…

vba数组dim_NDArray — —一个基于Java的N-Dim数组工具包

vba数组dim介绍 (Introduction) Within many development languages, there is a popular paradigm of using N-Dimensional arrays. They allow you to write numerical code that would otherwise require many levels of nested loops in only a few simple operations. Bec…

关于position的四个标签

四个标签是static,relative,absolute,fixed。 static 该值是正常流,并且是默认值,因此你很少看到(如果存在的话)指定该值。 relative:框的位置能够相对于它在正常流中的位置有所偏移…

python算法和数据结构_Python中的数据结构和算法

python算法和数据结构To至 Leonardo da Vinci达芬奇(Leonardo da Vinci) 介绍 (Introduction) The purpose of this article is to give you a panorama of data structures and algorithms in Python. This topic is very important for a Data Scientist in order to help …

CSS:元素塌陷问题

2019独角兽企业重金招聘Python工程师标准>>> 描述: 在文档流中,父元素的高度默认是被子元素撑开的,也就是子元素多高,父元素就多高。但是当子元素设置浮动之后,子元素会完全脱离文档流,此时将会…

Celery介绍及常见错误

celery 情景:用户发起request,并等待response返回。在本些views中,可能需要执行一段耗时的程序,那么用户就会等待很长时间,造成不好的用户体验,比如发送邮件、手机验证码等。 使用celery后,情况…

python dash_Dash是Databricks Spark后端的理想基于Python的前端

python dash📌 Learn how to deliver AI for Big Data using Dash & Databricks this recorded webinar with Peter Kim of Plotly and Prasad Kona of Databricks.this通过Plotly的Peter Kim和Databricks的Prasad Kona的网络研讨会了解如何使用Dash&#xff06…

Eclipse 插件开发遇到问题心得总结

Eclipse 插件开发遇到问题心得总结 Posted on 2011-07-17 00:51 季枫 阅读(3997) 评论(0) 编辑 收藏1、Eclipse 中插件开发多语言的实现 为了使用 .properties 文件,需要在 META-INF/MANIFEST.MF 文件中定义: Bundle-Localization: plugin 这样就会…

在Python中查找子字符串索引的5种方法

在Python中查找字符串中子字符串索引的5种方法 (5 Ways to Find the Index of a Substring in Strings in Python) str.find() str.find() str.rfind() str.rfind() str.index() str.index() str.rindex() str.rindex() re.search() re.search() str.find() (str.find()) …

Eclipse 插件开发 向导

阅读目录 最近由于特殊需要,开始学习插件开发。   下面就直接弄一个简单的插件吧!   1 新建一个插件工程   2 创建自己的插件名字,这个名字最好特殊一点,一遍融合到eclipse的时候,不会发生冲突。   3 下一步,进…

线性回归 假设_线性回归的假设

线性回归 假设Linear Regression is the bicycle of regression models. It’s simple yet incredibly useful. It can be used in a variety of domains. It has a nice closed formed solution, which makes model training a super-fast non-iterative process.线性回归是回…

solo

solo - 必应词典 美[soʊloʊ]英[səʊləʊ]n.【乐】独奏(曲);独唱(曲);单人舞;单独表演adj.独唱[奏]的;单独的;单人的v.独奏;放单飞adv.独网络梭罗;独奏曲;索罗变形复数&#xff1…

Eclipse 简介和插件开发天气预报

Eclipse 简介和插件开发 Eclipse 是一个很让人着迷的开发环境,它提供的核心框架和可扩展的插件机制给广大的程序员提供了无限的想象和创造空间。目前网上流传相当丰富且全面的开发工具方面的插件,但是 Eclipse 已经超越了开发环境的概念,可以…

趣味数据故事_坏数据的好故事

趣味数据故事Meet Julia. She’s a data engineer. Julia is responsible for ensuring that your data warehouses and lakes don’t turn into data swamps, and that, generally speaking, your data pipelines are in good working order.中号 EETJulia。 她是一名数据工程…