什么事数据科学_如果您想进入数据科学,则必须知道的7件事

什么事数据科学

No way. No freaking way to enter data science any time soon…That is exactly what I thought a year back.

没门。 很快就不会出现进入数据科学的怪异方式 ……这正是我一年前的想法。

A little bit about my data science story: I am a complete beginner in the Data Science field and I was desperately looking for a switch from digital marketing to data science exactly 6 months back. I assume you may want to ask..why desperately? Well, Because I became over confident in my job hunting abilities and resigned my ex-job without a backup. I started panicking during the last few days of my notice period. All the courses and tutorials available online and just the vast number of topics I had to cover to get started in data science was overwhelming for me. They say time flies and boy do I agree! It has already been half a year into my first data science job. I cannot wait to share all the learnings and experiences with you. If you are currently in the same shoes as I was, go on and keep reading for insights and motivation.

关于数据科学的故事:我是数据科学领域的一个完整的初学者,而我拼命地希望在6个月前从数字营销转向数据科学。 我想你可能想问..为什么要拼命? 好吧,因为我对自己的求职能力变得过于自信,并辞掉了我的前工作而没有后援。 在通知期的最后几天,我开始惊慌失措。 在线提供的所有课程和教程,以及我在数据科学入门中必须涵盖的大量主题,对我来说是不胜枚举的。 他们说时光飞逝,男孩,我同意! 我的第一份数据科学工作已经半年了。 我迫不及待想与您分享所有的学习和经验。 如果您目前的状态与我相同,请继续阅读以获取见识和动力。

  1. Practice more than you read:

    练习比:

I remember going through every single data science boot camp course available in Udemy and buying a couple of top rated courses that covered Python, SQL, Tableau and Machine Learning topics (Pro tip: Don’t go for generic “Data Science boot camps”. These courses don’t cover important topics in depth. Instead, try tool-specific boot camps like python boot camp, SQL boot camp, Deep Learning boot camp etc.). The courses were all detailed and honestly very helpful. But even after all the 50+ hours of lectures and many assignments, I was still someone with no data science experience. Even the basic analysis tasks in the first month of my job were relatively difficult for me. I was absolutely struggling to meet deadlines.

我记得我要遍历Udemy中的每个数据科学新手训练营课程,并购买几个涵盖Python,SQL,Tableau和机器学习主题的最受好评的课程(专业提示:不要参加通用的“数据科学新手训练营”。这些课程没有深入介绍重要的主题,而是尝试使用特定于工具的新手训练营,例如python新手训练营,SQL新手训练营,深度学习新手训练营等 。 这些课程都很详尽,说实话非常有帮助。 但是即使经过了50多个小时的讲座和许多任务,我仍然还是没有数据科学经验的人。 就连我上班第一个月的基本分析任务对我来说都是相对困难的。 我绝对难以按时完成任务。

Image for post
PinterestPinterest购买

Looking back, I feel that I focused more on learning and less on practicing. I listened to all the lectures which covered new topics in every lecture, did some teeny tiny assignments and thought I am doing it all the right way. However, I think of it all very differently now. Learning should be through practicing and implementing new ideas. That is when you make mistakes, observe new things, research on how to code the solution in a better way and you know..really learn. This certainly happened after starting my latest job as I had to work on new ideas every day and implement them. Trust me, that is when I picked up actual skills. If you are in the online course phase, spare some time to build projects and implement the topics you learned.

回顾过去,我觉得我更多地专注于学习而不是练习。 我听了所有讲座,每次讲座都涵盖了新主题,做了一些小小的小作业,并以为我做得很好。 但是,我现在对这一切的看法截然不同。 学习应该通过实践和实施新思想来进行。 那就是当您犯错,观察新事物,研究如何以更好的方式编写解决方案的代码时,您才真正了解。 这肯定是在开始我的最新工作后发生的,因为我每天必须研究和实施新的想法。 相信我,那是我掌握实际技能的时候。 如果您处于在线课程阶段,请花一些时间来构建项目并实施您学到的主题。

2. Coding skills:

2.编码技巧:

Image for post
https://changhsinlee.com/https://changhsinlee.com/购买

Most people who try to enter this field have a slight misconception that data science involves relatively less coding than software engineering. There is a little bit of truth to it. Because if you take Python which is the widely used language in data science, there are built-in libraries for almost all types of algorithms and operations. Though these libraries are very helpful, there is only so much they can do. I for one thought that data science is all about data analysis, plots, model fitting, prediction and accuracy metrics. These things are of course a part of it but software engineering is another huge part too. For example, when you want to build a production level product recommendation engine pipeline, you will have to work on many things like SQL scripts, data sync, training, tuning, prediction, evaluation frameworks, unit testing, logging, dashboards, admin panel, model deployment, version control and so much more. All of this combined involves a hell lot of critical thinking and coding. This is the kind of stuff you will work in the long run or maybe in your first few months! I am not saying that you need to know everything about coding everything but some level of proficiency in coding will be needed and also useful for you.

大多数尝试进入该领域的人都有些误解,认为数据科学涉及的编码少于软件工程。 有一点道理。 因为如果您使用Python(这是数据科学中广泛使用的语言),那么几乎所有类型的算法和操作都有内置的库。 尽管这些库非常有用,但是它们只能做很多事情。 我曾经以为,数据科学就是关于数据分析,图表,模型拟合,预测和准确性指标的全部。 这些当然是其中的一部分,但是软件工程也是另一个重要部分。 例如,当您要构建生产级别的产品推荐引擎管道时,您将需要处理许多事情,例如SQL脚本,数据同步,培训,调整,预测,评估框架,单元测试,日志记录,仪表板,管理面板,模型部署,版本控制等等。 所有这些结合在一起涉及大量的批判性思维和编码。 从长远来看,或者您可能会在头几个月中使用这种东西! 我并不是说您需要了解有关一切编码的所有知识,但是将需要一定程度的编码熟练度,并且对您也很有用。

3. No pressure to learn every single data science tool:

3.没有学习每个数据科学工具的压力:

There are way too many data science tools in the market and it can be quite confusing to find where to start. The best option is to learn one data science friendly coding language, one database tool and one visualization tool. This is a good way to begin with and is like the basic requirement for many entry level roles. When you are just laying the foundation, don’t pressure yourself to learn too many tools. Instead, take things slowly. Understand the basics and explore topics in depth in whatever tool you learn. You will eventually learn many tools when you are in the job due to project requirements or just while working on your passion projects.

市场上有太多的数据科学工具,很难找到从哪里开始。 最好的选择是学习一种数据科学友好的编码语言,一种数据库工具和一种可视化工具。 这是开始的好方法,就像许多入门级角色的基本要求一样。 当您只是奠定基础时,不要强迫自己学习太多的工具。 相反,慢慢来。 了解基础知识,并以所学的任何工具深入探讨主题。 由于项目要求或在从事激情项目时,您最终将在工作中学习许多工具。

Image for post
UdemyUdemy购买

I started with Python, SQL and Tableau when I was searching for a job. Nothing more. Now I know to work on a couple of other tools like Spark, Hbase, Kibana, Dash, Elasticsearch and Logstash. I am sure I will have to learn new tools in the coming days. The point is, learn a tool with utmost clarity of how it will be useful for your requirement.

在寻找工作时,我从Python,SQL和Tableau开始。 而已。 现在我知道要使用其他几个工具,例如Spark,Hbase,Kibana,Dash,Elasticsearch和Logstash。 我敢肯定,未来几天我将不得不学习新工具。 重点是,要学习一种最清楚如何满足您的需求的工具。

4. You are ready to take interviews:

4.您准备接受采访:

Tell that to yourself whenever you feel like skipping an interview call or meeting because your brain is telling you that you are not going to make it. I cannot remember the number of times I learned something new while attending an interview. It is either about the data science industry or new products or just a concept. I am not suggesting you to attend interviews randomly to learn stuff. It would be an obvious waste of time for the poor interviewer. Data science is a vague term and so are the job requirements for every data science role. You might never feel ready if you want to tick every single job requirement before attending an interview.

每当您想跳过面试电话或会议时告诉自己,因为您的大脑告诉您您不会参加。 我不记得参加面试时学习新知识的次数。 它与数据科学行业或新产品有关,或者只是一个概念。 我不建议您随机参加面试以学习知识。 对于可怜的面试官来说,这显然是浪费时间。 数据科学是一个模糊的术语,每个数据科学角色的工作要求也是如此。 如果您想在参加面试之前打勾每个工作要求,您可能永远也不会做好准备。

GiphyGiphy购买

The preparation phase can be a long one too. It depends on your learning speed and prior knowledge. It is very easy to get stuck in that phase because there are too many topics to cover. Set goals during interview preparation and as you achieve those goals, start looking for interview opportunities. Every time you fail an interview, you will find the need to improve on a particular area or learn a new market requirement. And that my friend will help you in the next interviews.

准备阶段也可能很长。 这取决于您的学习速度和先验知识。 由于涉及的主题太多,因此很容易陷入这一阶段。 在准备面试时设定目标,并在实现这些目标时开始寻找面试机会。 每次面试失败时,您都会发现需要改进特定领域或了解新的市场需求。 我的朋友会在下次面试中为您提供帮助。

5. Ideal companies to apply for data science roles

5.申请数据科学职位的理想公司

Usually, people are flexible about roles and companies when applying for interviews as beginners. But if you are wondering what is the type of company in which you should apply for a data science role, it is completely subjective. Let us talk about product-based and service-based companies from a data science perspective. Service companies usually work on one-time data analysis or prototype whereas product companies involve rigorous software development and data analysis is just a part of it. Python, R. Powerpoint and Excel will do the job for you most of the days in service companies whereas product companies will want you to work on whatever tool is required to do the job. Basically, product companies will involve a lot of software engineering in addition to data analysis.

通常,在初学者申请面试时,人们会灵活选择角色和公司。 但是,如果您想知道应申请数据科学职位的公司类型,那完全是主观的。 让我们从数据科学的角度谈谈基于产品和基于服务的公司。 服务公司通常从事一次性数据分析或原型工作,而产品公司则涉及严格的软件开发,而数据分析只是其中的一部分。 在服务公司中,Python,R。Powerpoint和Excel大部分时间都可以为您完成工作,而产品公司则希望您使用所需的任何工具来完成工作。 基本上,产品公司除数据分析外还将涉及许多软件工程。

They work on projects that will help them to improve their products by incorporating data science in them or they make new data based products like product recommendation engine, AI-based chatbots etc. or they just use analytics to make better decisions in the organization. Service companies work on analytics projects purely based on client requirements. So like I said it is up to your interests. Choose wisely!

他们从事的项目将通过整合数据科学来帮助他们改善产品,或者开发基于新数据的产品,例如产品推荐引擎,基于AI的聊天机器人等,或者他们只是使用分析方法在组织中做出更好的决策。 服务公司纯粹根据客户需求来进行分析项目。 因此,就像我说的那样,这取决于您的兴趣。 做出明智的选择!

6. Data Science can be frustrating:

6.数据科学可能令人沮丧:

Data-based problems are very interesting to work on but some can be equally frustrating too. One of the difficult aspects of your work will be just to patiently wait for good results. Often you might not know whether you are going in the right direction. There are too many unknowns and a lot of things in your project will require plain trial and error to arrive at an optimal solution. Like they say it is all fun and games till you reach the hyper-parameter tuning part of your model :)

基于数据的问题非常有趣,但是有些问题同样令人沮丧。 工作的困难之处之一就是耐心等待良好的结果。 通常,您可能不知道自己是否朝着正确的方向前进。 未知数太多,您项目中的许多事情都需要经过反复试验才能得出最佳解决方案。 就像他们说的那样,这很有趣,也很有趣,直到您到达模型的超参数调整部分为止:)

Most of us do a Proof of Concept before implementing any solution. But sometimes even POCs fail to give insights about certain hiccups you might face during the actual task. For example, once at work, we spent an entire month researching and implementing a solution for our pipeline. It eventually didn’t work out. We had to start all over again and this caused a huge progress lag in the supposedly well-performing project. The key take away from a couple of incidents like this is that always set clear goals, evaluate your POC thoroughly and when stuck at a point for too long, just remember to try fast, fail fast, evaluate fast and try again fast. Being fast is super important for good progress.

我们大多数人在实施任何解决方案之前都要进行概念验证。 但是有时候,甚至POC都无法提供您在实际任务中可能遇到的某些打h的见解。 例如,一旦上班,我们就花了整整一个月的时间研究和实施管道解决方案。 最终没有奏效。 我们不得不重新开始,这在原本表现良好的项目中造成了巨大的进度滞后。 避免发生此类事件的关键是始终设定明确的目标,彻底评估POC,并且在某个时间停留太长时间时,请记住要快尝试,快失败,快评估并再试一次。 快节奏对于取得良好的进步至关重要。

7. Your storytelling skills will matter a lot:

7. 您的讲故事技巧非常重要:

You will most likely be dealing with customers from non-technical backgrounds. Your organization leaders may not be data scientists. Your own teammates might be from diverse backgrounds (pure mathematicians, some API users etc.). These are the people who will recognize your work and will add value to your work.

您很可能会与非技术背景的客户打交道。 您的组织负责人可能不是数据科学家。 您自己的队友可能来自不同的背景(纯数学家,某些API用户等)。 这些人将认可您的工作并为您的工作增添价值。

It is so important that you communicate your thoughts, ideas, analyses and results in an interactive and understandable way to your audience. I clearly remember struggling in my first team meeting with the CEO where we had to explain the progress in projects, discuss use cases and future AI goals. That is when it hit me that sticking to numbers and just analytical skills are not enough. A good story explaining the analysis can interest your manager. A story explaining how a particular data science solution can solve the pain point of a problem can interest your customer. Different stories have different impacts on different people. Frame your story carefully with data science elements like visualizations, dashboards, reports etc. and put your everything in it while delivering it.

以互动和易于理解的方式与听众交流思想,想法,分析和结果非常重要。 我清楚地记得,在与首席执行官的第一次团队会议中,我们不得不解释项目的进展,讨论用例和未来的AI目标时遇到的困难。 那就是让我感到震惊的是,仅仅依靠数字和仅仅分析技能是不够的。 讲解分析的好故事会让您的经理感兴趣。 解释特定数据科学解决方案如何解决问题痛点的故事可能会使您的客户感兴趣。 不同的故事对不同的人有不同的影响。 借助可视化,仪表板,报告等数据科学元素精心构建故事,并在交付时将所有内容放入其中。

Final Thoughts:

最后的想法:

Data Science is no rocket science. If I can do it, then you can do it too! There is no good time as now to enter this fast-growing field. That being said, it definitely gets a little bit tough to keep up with all the new things happening in this field and the competition. But, what matters is that we learn, implement, make mistakes and grow consistently. Happy analyzing:)

数据科学不是火箭科学。 如果我可以做到,那么您也可以做到! 现在没有进入这个快速增长领域的好时机。 话虽这么说,要跟上该领域和竞争中发生的所有新事物肯定会有点困难。 但是,重要的是我们学习,实施,犯错误并不断成长。 分析愉快:)

翻译自: https://medium.com/swlh/7-things-you-must-know-if-youre-trying-to-enter-data-science-2a9a531750e0

什么事数据科学

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389915.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Java基础-基本数据类型

Java中常见的转义字符: 某些字符前面加上\代表了一些特殊含义: \r :return 表示把光标定位到本行行首. \n :next 表示把光标定位到下一行同样的位置. 单独使用在某些平台上会产生不同的效果.通常这两个一起使用,即:\r\n. 表示换行. \t :tab键,长度上相当于四个或者是八个空格 …

季节性时间序列数据分析_如何指导时间序列数据的探索性数据分析

季节性时间序列数据分析为什么要进行探索性数据分析? (Why Exploratory Data Analysis?) You might have heard that before proceeding with a machine learning problem it is good to do en end-to-end analysis of the data by carrying a proper exploratory …

TortoiseGit上传项目到GitHub

1. 简介 gitHub是一个面向开源及私有软件项目的托管平台,因为只支持git 作为唯一的版本库格式进行托管,故名gitHub。 2. 准备 2.1 安装git:https://git-scm.com/downloads。无脑安装 2.2 安装TortoiseGit(小乌龟):https://torto…

利用PHP扩展Taint找出网站的潜在安全漏洞实践

一、背景 笔者从接触计算机后就对网络安全一直比较感兴趣,在做PHP开发后对WEB安全一直比较关注,2016时无意中发现Taint这个扩展,体验之后发现确实好用;不过当时在查询相关资料时候发现关注此扩展的人数并不多;最近因为…

美团骑手检测出虚假定位_在虚假信息活动中检测协调

美团骑手检测出虚假定位Coordination is one of the central features of information operations and disinformation campaigns, which can be defined as concerted efforts to target people with false or misleading information, often with some strategic objective (…

CertUtil.exe被利用来下载恶意软件

1、前言 经过国外文章信息,CertUtil.exe下载恶意软件的样本。 2、实现原理 Windows有一个名为CertUtil的内置程序,可用于在Windows中管理证书。使用此程序可以在Windows中安装,备份,删除,管理和执行与证书和证书存储相…

335. 路径交叉

335. 路径交叉 给你一个整数数组 distance 。 从 X-Y 平面上的点 (0,0) 开始,先向北移动 distance[0] 米,然后向西移动 distance[1] 米,向南移动 distance[2] 米,向东移动 distance[3] 米,持续移动。也就是说&#x…

回归分析假设_回归分析假设的最简单指南

回归分析假设The Linear Regression is the simplest non-trivial relationship. The biggest mistake one can make is to perform a regression analysis that violates one of its assumptions! So, it is important to consider these assumptions before applying regress…

Spring Aop之Advisor解析

2019独角兽企业重金招聘Python工程师标准>>> 在上文Spring Aop之Target Source详解中,我们讲解了Spring是如何通过封装Target Source来达到对最终获取的目标bean进行封装的目的。其中我们讲解到,Spring Aop对目标bean进行代理是通过Annotatio…

为什么随机性是信息

用位思考 (Thinking in terms of Bits) Imagine you want to send outcomes of 3 coin flips to your friends house. Your friend knows that you want to send him those messages but all he can do is get the answer of Yes/No questions arranged by him. Lets assume th…

大数据相关从业_如何在组织中以数据从业者的身份闪耀

大数据相关从业Build bridges, keep the maths under your hat and focus on serving.架起桥梁,将数学放在脑海中,并专注于服务。 通过协作而不是通过孤立的孤岛来交付出色的数据工作。 (Deliver great data work through collaboration not through co…

Django进阶之中间件

中间件简介 在http请求 到达视图函数之前 和视图函数return之后,django会根据自己的规则在合适的时机执行中间件中相应的方法。 中间件的执行流程 1、执行完所有的request方法 到达视图函数。 2、执行中间件的其他方法 2、经过所有response方法 返回客户端。 注意…

汉诺塔递归算法进阶_进阶python 1递归

汉诺塔递归算法进阶When something is specified in terms of itself, it is called recursion. The recursion gives us a new idea of how to solve a kind of problem and this gives us insights into the nature of computation. Basically, many of computational artifa…

windows 停止nginx

1、查找进程 tasklist | findstr nginx2、杀死进程 taskkill /pid 6508 /F3、一次杀死多个进程taskkill /pid 6508 /pid 16048 /f转载于:https://blog.51cto.com/dressame/2161759

SpringBoot返回json和xml

有些情况接口需要返回的是xml数据&#xff0c;在springboot中并不需要每次都转换一下数据格式&#xff0c;只需做一些微调整即可。 新建一个springboot项目&#xff0c;加入依赖jackson-dataformat-xml&#xff0c;pom文件代码如下&#xff1a; <?xml version"1.0&quo…

orange 数据分析_使用Orange GUI的放置结果数据分析

orange 数据分析Objective : Analysing of several factors influencing the recruitment of students and extracting information through plots.目的&#xff1a;分析影响学生招生和通过情节提取信息的几个因素。 Description : The following analysis presents the diffe…

普里姆从不同顶点出发_来自三个不同聚类分析的三个不同教训数据科学的顶点...

普里姆从不同顶点出发绘制大流行时期社区的风险群图&#xff1a;以布宜诺斯艾利斯为例 (Map Risk Clusters of Neighbourhoods in the time of Pandemic: a case of Buenos Aires) 介绍 (Introduction) Every year is unique and particular. But, 2020 brought the world the …

荷兰牛栏 荷兰售价_荷兰的公路货运是如何发展的

荷兰牛栏 荷兰售价I spent hours daily driving on one of the busiest motorways in the Netherlands when commuting was still a norm. When I first came across with the goods vehicle data on CBS website, it immediately attracted my attention: it could answer tho…

Vim 行号的显示与隐藏

2019独角兽企业重金招聘Python工程师标准>>> Vim 行号的显示与隐藏 一、当前文档的显示与隐藏 1 打开一个文档 [rootpcname ~]# vim demo.txt This is the main Apache HTTP server configuration file. It contains the configuration directives that give the s…

结对项目-小学生四则运算系统网页版项目报告

结对作业搭档&#xff1a;童宇欣 本篇博客结构一览&#xff1a; 1&#xff09;.前言(包括仓库地址等项目信息) 2&#xff09;.开始前PSP展示 3&#xff09;.结对编程对接口的设计 4&#xff09;.计算模块接口的设计与实现过程 5&#xff09;.计算模块接口部分的性能改进 6&…