数据暑假实习面试_面试数据科学实习如何准备

数据暑假实习面试

Unfortunately, on this occasion, your application was not successful, and we have appointed an applicant who…

不幸的是,这一次,您的申请没有成功,我们已经任命了一位符合以下条件的申请人:

Sounds familiar, right? After all of these gruelling hours that I spend on the interview preparation, the rejection came after the rejection. Although I was passing the first few interview stages, it didn’t go that well for me during the face-to-face stages. “What a spectacular failure I am”, I thought.

听起来很熟悉,对不对? 在我花了所有艰苦的时间进行面试准备之后,拒绝就被拒绝了。 尽管我已经通过了前几个面试阶段,但是在面对面阶段对我来说进展并不顺利。 我想:“我是多么的失败。”

I started looking for ways to improve. I’ve identified a few areas that are usually overlooked but can potentially have a huge impact on what will be the interview outcome. This, in turn, helped me to improve and get a job that I wanted to have!

我开始寻找改善的方法。 我已经确定了一些通常被忽略的领域,但它们可能对面试结果产生巨大影响。 反过来,这帮助我改善了工作并获得了想要的工作!

正确掌握基础知识 (Get The Basics Right)

Image for post
Photo by Clay Banks on Unsplash
Clay Banks在Unsplash上拍摄的照片

The DS internships are usually quite competitive and any red flag for the recruiter might decide if you are rejected straightaway. One of these red flags is whether your foundations are good enough. Data science is a field where you are required to have good mathematical and programming knowledge.

DS实习生通常竞争激烈,招募人员的任何危险信号都可能决定您是否被直接拒绝。 这些危险信号之一是您的基础是否足够好。 数据科学是一个要求您具有良好数学和编程知识的领域。

How can you improve? For data science theory, I recommend getting a good mathematical understanding of the most common algorithms. There are two books that I usually recommend: Pattern Recognition and Machine Learning, and First Course in Machine Learning. Both of them contain in-depth mathematical explanations of machine learning algorithms which will help you smash DS interview questions to pieces!

您如何改善? 对于数据科学理论,我建议您对最常见的算法有一个很好的数学理解。 我通常推荐两本书: 模式识别和机器学习 ,以及机器学习 第一门课程 。 它们都包含对机器学习算法的深入数学解释,这将帮助您将DS面试问题粉碎成碎片!

Depending on the company, you might be also asked programming questions. They are often not that hard but given the stress and time constraints, you really need to master them as well. You should expect any questions from sorting, recurrence, to data structures. It’s good to start practicing these questions as soon as possible. To get a good understanding of how to approach the coding questions, I recommend going through the Cracking the Coding Interview book. To get more practical experience, visit the Hackerrank, or LeetCode.

根据公司的不同,可能还会询问您编程方面的问题。 它们通常并不难,但是由于压力和时间限制,您确实也需要掌握它们。 您应该期望从排序,重复出现到数据结构的任何问题。 最好尽快开始练习这些问题。 为了更好地理解编码问题,我建议您阅读《 破解编码面试》一书。 要获得更多实践经验,请访问HackerrankLeetCode

Glassdoor是您最好的朋友 (Glassdoor is Your Best Friend)

You can also get a good feel of what is the company’s culture and atmosphere from the Glassdoor reviews. This can give you a good indication of whether that company is a good fit for you. If, for example, one company seems to have really toxic atmosphere maybe it would be better to withdraw the application and spend more time to prepare for interviews at other companies? What’s the point in interviewing with companies that you don’t really want to work for?

从Glassdoor的评论中,您还可以很好地了解公司的文化和氛围。 这可以很好地表明该公司是否适合您。 例如,如果一家公司似乎真的有毒的气氛,那么最好撤回申请并花更多时间准备在其他公司进行面试是否更好? 面试您真的不想工作的公司有什么意义?

You can also find some really useful information about the interview structure, or about the type of questions they ask. Some companies are literally asking the same set of questions every time! I am not sure why they are doing that, but in this case, you should notice that the questions are being repeated in the Glassdoor reviews. You can take it to your advantage and learn them by heart.

您还可以找到有关面试结构或他们提出的问题类型的一些非常有用的信息。 实际上,有些公司每次都在问同样的问题! 我不确定他们为什么这样做,但是在这种情况下,您应该注意到,Glassdoor审查中重复出现了这些问题。 您可以发挥自己的优势,并认真学习。

容易的面试问题并不容易 (Easy Interview Questions are NOT Easy)

Image for post
Photo by Jules Bss on Unsplash
Jules Bss在Unsplash上拍摄的照片

Imagine a situation when the interviewer asks: what’s the linear regression?

想象一下,当面试官问:线性回归是什么?

You can answer either:

您可以回答:

It is a linear approach that models the relationship in data between dependent and independent variables.

这是一种线性方法,可对因变量和自变量之间的数据关系进行建模。

Or:

要么:

It is a linear approach that models the relationship in data between dependent and independent variables. The model’s parameters can be derived using ordinary least squares approach and a general equation works on multi-dimensional data. It is an algorithm that is simple, fast, and interpretable. However, it has certain caveats such as …

这是一种线性方法,可对因变量和自变量之间的数据关系进行建模。 可以使用普通最小二乘法得出模型的参数,并且通用方程适用于多维数据。 它是一种简单,快速且可解释的算法。 但是,它有一些警告,例如……

Do you see what I mean? By asking a simple-looking question, the interviewer can test two things. Firstly, if you have a basic knowledge (obvious). Secondly, it tests what is the depth of your understanding and how inquisitive you are while studying a certain topic. This ability is crucial in the data scientist skillset as you will often have to work with new tools and read research papers. If you don’t analyze the subject thoroughly and fail to understand its limitations and capabilities, it’s a straight path that leads to an unsuccessful project.

你明白我的意思吗? 通过问一个简单的问题,面试官可以测试两件事。 首先,如果您具有基本知识(显而易见)。 其次,它测试您对特定主题的理解的深度和好奇心。 该功能对于数据科学家技能至关重要,因为您经常需要使用新工具并阅读研究论文。 如果您没有对主题进行全面分析,并且不了解主题的局限性和功能,那么这是导致项目失败的直接途径。

展示项目。 质量还是数量? (Showcase Projects. Quality or Quantity?)

TLDR; Quality!

TLDR; 质量!

Image for post
[Source][资源]

The painful truth is that nobody cares about the endless Jupyter notebooks that you created for your 100+ mini-projects. Don’t take me wrong: it’s still a great way to experiment with new models and data. But, most likely, it won’t impress the interviewer.

痛苦的事实是,没有人会关心您为100多个迷你项目创建的无尽Jupyter笔记本。 不要误会我的意思:这仍然是尝试新模型和数据的好方法。 但是,很可能不会给面试官留下深刻的印象。

There is much more to data science than just creating dozens of untested machine learning models in a single file. In the real-life scenario, the code needs to be tested, packaged, documented and deployed using internal servers or cloud services.

数据科学不仅仅是在单个文件中创建数十个未经测试的机器学习模型,还具有更多的功能。 在实际场景中,需要使用内部服务器或云服务来测试,打包,记录和部署代码。

My advice? Go for the quality and aim to create ~3 bigger projects that will impress the interviewers. Here are some tips that you can follow:

我的建议? 追求质量 ,目标是创建〜3个更大的项目,这些项目将使访问员印象深刻 您可以按照以下提示操作:

  • Find a real-world dataset that requires a lot of preprocessing and EDA

    查找需要大量预处理和EDA的真实数据集
  • Make your code modular: create separate classes for models, data preprocessing, and end-to-end pipelines

    使代码模块化:为模型,数据预处理和端到端管道创建单独的类
  • Use test-driven development (TDD) while developing a packaged code

    在开发打包的代码时使用测试驱动的开发(TDD)

  • Work with Git and continuous integration services such as CircleCI

    与Git和持续集成服务(例如CircleCI)一起使用

  • Expose the model’s API to the user, e.g. Flask for Python

    向用户公开模型的API,例如Flask for Python

  • Document the code using Sphinx and adhere to code styling guidelines (e.g. PEP-8 for Python)

    使用Sphinx记录代码并遵守代码样式准则(例如,用于Python的PEP-8 )

A really good course on ML model deployment was created by data scientists from Babylon Health and Train In Data at Udemy. You can find it here.

来自于Udemy的Babylon HealthTrain In Data的数据科学家创建了关于ML模型部署的非常好的课程。 你可以在这里找到它。

奖励:简历模板 (Bonus: CV Template)

I am a big fan of 1-page CVs for data science internships. It helps me to keep it simple and clear without redundant information. I used to have a Word template in the past, but I was losing a lot of time modifying it. When I was removing or adding some information, the formatting was instantly blowing off making my CV look like the Enigma code 😆

我非常喜欢用于数据科学实习的1页简历。 它可以帮助我在没有多余信息的情况下保持简单明了。 我过去曾经有一个Word模板,但是我浪费了很多时间来修改它。 当我删除或添加一些信息时,格式立即消失,使我的简历看起来像Enigma代码😆

Anyway, I found a nice looking Overleaf CV template that I’ve been using ever since. It is simple, clear, and most importantly, it’s rendered with a modular Latex code that makes formatting a painless task. The link to the CV template is here.

无论如何,我找到了自此以来一直在使用的漂亮的Overleaf CV模板。 它简单,清晰,最重要的是,它使用模块化的Latex代码进行渲染,从而使格式化工作变得轻而易举。 简历模板的链接在这里 。

关于我 (About Me)

I am an MSc Artificial Intelligence student at the University of Amsterdam. In my spare time, you can find me fiddling with data or debugging my deep learning model (I swear it worked!). I also like hiking :)

我是阿姆斯特丹大学的人工智能硕士研究生。 在业余时间,您会发现我不喜欢数据或调试我的深度学习模型(我发誓它能工作!)。 我也喜欢远足:)

Here are my social media profiles, if you want to stay in touch with my latest articles and other useful content:

如果您想与我的最新文章和其他有用内容保持联系,这是我的社交媒体个人资料:

  • Linkedin

    领英

  • Github

    Github

  • Personal Website

    个人网站

翻译自: https://towardsdatascience.com/interviewing-for-data-science-internship-how-to-prepare-f6b9c2c7fa97

数据暑假实习面试

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389666.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

两道简单的入门题

1&#xff09;  for循环求100以内奇数和 1 #include<stdio.h> 2 int main(){ 3 int ans0;//定义一个答案变量存储答案 4 for(int i1;i<100;i)//用for从1循环到100&#xff0c;如果i%2&#xff01;0&#xff08;%是一种取余运算&#xff09; 5 if(…

1716. 计算力扣银行的钱

1716. 计算力扣银行的钱 Hercy 想要为购买第一辆车存钱。他 每天 都往力扣银行里存钱。 最开始&#xff0c;他在周一的时候存入 1 块钱。从周二到周日&#xff0c;他每天都比前一天多存入 1 块钱。在接下来每一个周一&#xff0c;他都会比 前一个周一 多存入 1 块钱。 给你 …

谷歌 colab_如何在Google Colab上使用熊猫分析

谷歌 colabRecently, pandas have come up with an amazing open-source library called pandas-profiling. Generally, EDA starts by df.describe(), df.info() and etc which to be done separately. Pandas_profiling extends the general data frame report using a singl…

【题解】HAOI2007分割矩阵

水题盛宴啦啦啦……做起来真的极其舒服&#xff0c;比某些毒瘤题好太多了…… 数据范围极小 --> 状压 / 搜索 / 高维度dp&#xff1b;观察要求的均方差&#xff0c;开始考虑是不是能够换一下式子。我们用\(a_{x}\)来表示第 \(x\) 个矩阵的总值&#xff0c;则式子为&#xff…

Java之生成Pdf并对Pdf内容操作

虽说网上有很多可以在线导出Pdf或者word或者转成png等格式的工具&#xff0c;但是我觉得还是得了解知道是怎么实现的。一来&#xff0c;在线免费转换工具&#xff0c;是有容量限制的&#xff0c;达到一定的容量时&#xff0c;是不能成功导出的;二来&#xff0c;业务需求&#x…

边际概率条件概率_数据科学家解释的边际联合和条件概率

边际概率条件概率Probability plays a very important role in Data Science, as Data Scientist regularly attempt to draw statistical inferences that could be used to predict data or analyse data better.P robability起着数据科学非常重要的作用&#xff0c;为数据科…

1822. 数组元素积的符号

1822. 数组元素积的符号 已知函数 signFunc(x) 将会根据 x 的正负返回特定值&#xff1a; 如果 x 是正数&#xff0c;返回 1 。 如果 x 是负数&#xff0c;返回 -1 。 如果 x 是等于 0 &#xff0c;返回 0 。 给你一个整数数组 nums 。令 product 为数组 nums 中所有元素值的…

java并发编程实战:第十四章----构建自定义的同步工具

一、状态依赖性管理 对于单线程程序&#xff0c;某个条件为假&#xff0c;那么这个条件将永远无法成真在并发程序中&#xff0c;基于状态的条件可能会由于其他线程的操作而改变1 可阻塞的状态依赖操作的结构2 3 acquire lock on object state4 while (precondition does not ho…

关于之前的函数式编程

之前写的函数式编程是我从 JavaScript ES6 函数式编程入门经典这本书里面整理的&#xff0c;然后只在第一篇里专门提到了&#xff0c;后面的话没有专门提到&#xff0c;而且引用了书中大量的文字&#xff0c;所以我把掘金这里的文章都删除了&#xff0c;然后在 CSDN 上面每一篇…

袋装决策树_袋装树是每个数据科学家需要的机器学习算法

袋装决策树袋装树木介绍 (Introduction to Bagged Trees) Without diving into the specifics just yet, it’s important that you have some foundation understanding of decision trees.尚未深入研究细节&#xff0c;对决策树有一定基础了解就很重要。 From the evaluatio…

[JS 分析] 天_眼_查 字体文件

0. 参考 js分析 猫_眼_电_影 字体文件 font-face 1. 分析 1.1 定位目标元素 1.2 查看网页源代码 1.3 requests 请求提取得到大量错误信息 对比猫_眼_电_影抓取到unicode编码&#xff0c;天_眼_查混合使用正常字体和自定义字体&#xff0c;难点在于如何从 红 转化为 美。 一开始…

深入学习Redis(4):哨兵

前言在 深入学习Redis&#xff08;3&#xff09;&#xff1a;主从复制 中曾提到&#xff0c;Redis主从复制的作用有数据热备、负载均衡、故障恢复等&#xff1b;但主从复制存在的一个问题是故障恢复无法自动化。本文将要介绍的哨兵&#xff0c;它基于Redis主从复制&#xff0c;…

1805. 字符串中不同整数的数目

1805. 字符串中不同整数的数目 给你一个字符串 word &#xff0c;该字符串由数字和小写英文字母组成。 请你用空格替换每个不是数字的字符。例如&#xff0c;“a123bc34d8ef34” 将会变成 " 123 34 8 34" 。注意&#xff0c;剩下的这些整数为&#xff08;相邻彼此至…

经天测绘测量工具包_公共土地测量系统

经天测绘测量工具包部分-乡镇第一师 (Sections — First Divisions of Townships) The PLSS Townships are typically divided into 36 Sections (nominally one mile on a side), but in the national standard this feature is called the first division because Townships …

洛谷 P4012 深海机器人问题【费用流】

题目链接&#xff1a;https://www.luogu.org/problemnew/show/P4012 洛谷 P4012 深海机器人问题 输入输出样例 输入样例#1&#xff1a; 1 1 2 2 1 2 3 4 5 6 7 2 8 10 9 3 2 0 0 2 2 2 输出样例#1&#xff1a; 42 说明 题解&#xff1a;建图方法如下&#xff1a; 对于矩阵中的每…

day5 模拟用户登录

_user "yangtuo" _passwd "123456"# passd_authentication False #flag 标志位for i in range(3): #for 语句后面可以跟else&#xff0c;但是不能跟elifusername input("Username:")password input("Password:")if username _use…

opencv实现对象跟踪_如何使用opencv跟踪对象的距离和角度

opencv实现对象跟踪介绍 (Introduction) Tracking the distance and angle of an object has many practical uses, especially in robotics. This tutorial explains how to get an accurate distance and angle measurement, even when the target is at a strong angle from…

spring cloud 入门系列七:基于Git存储的分布式配置中心--Spring Cloud Config

我们前面接触到的spring cloud组件都是基于Netflix的组件进行实现的&#xff0c;这次我们来看下spring cloud 团队自己创建的一个全新项目&#xff1a;Spring Cloud Config.它用来为分布式系统中的基础设施和微服务提供集中化的外部配置支持&#xff0c;分为服务端和客户端两个…

458. 可怜的小猪

458. 可怜的小猪 有 buckets 桶液体&#xff0c;其中 正好 有一桶含有毒药&#xff0c;其余装的都是水。它们从外观看起来都一样。为了弄清楚哪只水桶含有毒药&#xff0c;你可以喂一些猪喝&#xff0c;通过观察猪是否会死进行判断。不幸的是&#xff0c;你只有 minutesToTest…

熊猫数据集_大熊猫数据框的5个基本操作

熊猫数据集Tips and Tricks for Data Science数据科学技巧与窍门 Pandas is a powerful and easy-to-use software library written in the Python programming language, and is used for data manipulation and analysis.Pandas是使用Python编程语言编写的功能强大且易于使用…