多种数据库连接工具_20多种热门数据工具及其不具备的功能

多种数据库连接工具

In the past few months, the data ecosystem has continued to burgeon as some parts of the stack consolidate and as new challenges arise. Our first attempt to help stakeholders navigate this ecosystem highlighted 25 Hot New Data Tools and What They DON’T Do — clarifying specific problems the featured companies and projects did and did NOT solve.

在过去的几个月中,随着堆栈中某些部分的合并以及新挑战的出现,数据生态系统继续蓬勃发展。 我们帮助利益相关者在这个生态系统中导航的首次尝试着重介绍了25个热门新数据工具及其不做的事情 -阐明了特色公司和项目已解决和未解决的具体问题。

This effort was positively received by the data science, engineering and analytics communities, and spurred more engagement than we originally anticipated. Further, we were flattered to see the original post motivate other thought-provoking pieces such as 20 Hot New Data Tools and their Early Go-to-Market Strategies.

这项努力得到了数据科学,工程和分析社区的积极欢迎,并激发了比我们最初预期更多的参与。 此外,我们很高兴看到原始帖子激发了其他发人深省的内容,例如20个热门新数据工具及其早期的进入市场策略 。

更进一步 (Taking it Further)

Regardless, we quickly recognized our original post did not go far enough as we received dozens of emails, Twitter messages and Slack DMs about other solutions that were not covered. We had shed light on a small corner of the expanding universe of data tools and platforms, yet there was an opportunity to cover even more.

无论如何,我们很快意识到我们的原始帖子远远不够,因为我们收到了数十封关于其他解决方案的电子邮件,Twitter消息和Slack DM,这些其他解决方案均未涵盖。 我们在不断扩展的数据工具和平台领域中发现了一个小角落,但仍有机会涵盖更多内容。

Although we cannot chronicle every additional data tool in just one follow-up post, here we continue our efforts to cultivate this ecosystem by highlighting a few more. The creators of these tools are not only occupying meaningful parts of the ever-evolving modern data stack, they graciously responded to our requests to help us understand where they fit in.

尽管我们无法仅在一个后续职位中列出所有其他数据工具,但在此我们通过重点介绍更多内容来继续努力培育这个生态系统。 这些工具的创建者不仅占据了不断发展的现代数据堆栈中有意义的部分,而且还亲切响应我们的要求,以帮助我们了解它们的适用范围。

They sound-off here in their own words.

他们在这里用自己的话说。

更多工具和响应 (More Tools and Responses)

  1. Shipyard: Shipyard is a workflow orchestration platform that helps teams quickly launch, monitor, and share data solutions without worrying about infrastructure management. It lets users create reusable blueprints, share data seamlessly between jobs, and run code without any proprietary setup, all while scaling resources dynamically. Shipyard is NOT a no-code tool and does not support data versioning or data visualization.

    造船厂 :造船厂是一个工作流程编排平台,可以帮助团队快速启动,监视和共享数据解决方案,而不必担心基础架构管理。 它使用户可以创建可重用的蓝图,在作业之间无缝共享数据,并且无需任何专有设置即可运行代码,而所有这些都可以动态扩展资源。 Shipyard不是一种非代码工具,并且不支持数据版本控制或数据可视化。

  2. Count: Count is a data notebook that replaces dashboards for reporting and self-service, and supports data transformation. Count is uniquely good at team collaboration, enabling technical and non-technical users to work within the same notebook. Count is NOT a data science notebook.

    Count :Count是一个数据笔记本,它取代了用于报告和自助服务的仪表板,并支持数据转换。 Count非常擅长团队协作,使技术和非技术用户都可以在同一笔记本上工作。 Count不是数据科学笔记本。

  3. Castor: Castor is uniquely good at organizing information about data to support data discovery, GDPR compliance, and knowledge management. Through a plug-and-play solution, Castor builds a comprehensive and actionable map of all data assets. Castor is NOT a data visualization or BI tool.

    Castor :Castor非常擅长组织有关数据的信息,以支持数据发现,GDPR合规性和知识管理。 通过即插即用解决方案,Castor可以构建所有数据资产的全面且可行的地图。 Castor不是数据可视化或BI工具。

  4. Census: Census is uniquely good at syncing data models from a warehouse to business tools like Salesforce. It complements existing warehouses, data loaders & transform tools to enable data teams to drive business operations. It is NOT a no-code tool nor does it automagically model your data; it relies on analysts writing models in SQL.

    人口普查 :人口普查在将数据模型从仓库同步到Salesforce等业务工具方面具有独特的优势。 它是对现有仓库,数据加载器和转换工具的补充,以使数据团队能够推动业务运营。 它不是无代码工具,也不是自动对数据建模的工具。 它依靠分析师用SQL编写模型。

  5. Iteratively: Iteratively is a schema registry that helps teams collaborate to define, instrument, and validate their analytics. With Iteratively, you can ship high-quality analytics faster and prevent common data quality & privacy issues that undermine trust. Iteratively is NOT a BI tool, data pipeline, or transformation tool.

    反复进行 :反复进行是一个架构注册表,可以帮助团队协作来定义,检测和验证其分析。 借助迭代,您可以更快地交付高质量的分析,并防止破坏信任的常见数据质量和隐私问题。 迭代地不是BI工具,数据管道或转换工具。

  6. StreamSQL: StreamSQL handles deploying, versioning, and sharing model features. Using your definitions, it generates features for both serving and training. Its registry facilitates re-using features across teams and models. Stream does NOT model management and is completely agnostic to what you do with the features once you get them.

    StreamSQL :StreamSQL处理部署,版本控制和共享模型功能。 使用您的定义,它可以为服务和培训生成功能。 其注册表有助于跨团队和模型重用功能。 Stream不对管理进行建模,一旦获得这些功能,您将完全不知所措。

  7. Xplenty: Xplenty is a cloud-based ETL solution providing simple visualized data pipelines for automated data flows across a wide range of sources and destinations. It is uniquely good at ingesting large volumes of data, performing code-free data transformations, and scheduling workflows. Xplenty does NOT do event streaming.

    Xplenty :Xplenty是基于云的ETL解决方案,它提供了简单的可视化数据管道,用于跨各种来源和目的地的自动化数据流。 它在吸收大量数据,执行无代码的数据转换以及调度工作流方面具有独特的优势。 Xplenty不执行事件流传输。

  8. Vectice: Vectice is uniquely good at tracking, documenting, organizing all AI assets (e.g datasets, features, models, experiments, dashboards, notebooks) and the underlying domain knowledge to successfully manage and scale the enterprise AI initiatives. Vectice does NOT provide any runtime or computational environment.

    Vectice :Vectice独特地擅长跟踪,记录,组织所有AI资产(例如,数据集,功能,模型,实验,仪表板,笔记本)和基础领域知识,以成功管理和扩展企业AI计划。 Vectice不提供任何运行时或计算环境。

  9. Snowplow Analytics: Snowplow is a streaming behavioral data engine that is uniquely good at generating event data from dedicated web/mobile/server SDKs, enhancing that data and delivering it to your data warehouse. Snowplow is NOT a data integration (ELT) tool, nor a general streaming framework, nor a BI tool.

    Snowplow Analytics :Snowplow是一种流式行为数据引擎,非常擅长从专用的Web /移动/服务器SDK生成事件数据,增强该数据并将其传递到您的数据仓库。 Snowplow并不是数据集成(ELT)工具,也不是通用的流框架,也不是BI工具。

  10. Datafold: Datafold is uniquely good at comparing datasets in a SQL data warehouse or across data warehouses. It enables running “git diff” on a table of any size. Datafold is NOT a database itself (it works on top of existing infrastructure) and it does NOT work with files.

    数据折叠 :数据折叠独特地擅长比较SQL数据仓库或跨数据仓库中的数据集。 它允许在任何大小的表上运行“ git diff”。 Datafold本身不是数据库(它可以在现有基础结构之上运行),并且不能与文件一起使用。

  11. Splitgraph: Splitgraph is a tool for building, extending, versioning, and sharing SQL databases that is uniquely good at enhancing existing tools. Splitgraph also features a data catalogue including 40K open datasets that can be queried (and joined) with any SQL client. Splitgraph is NOT a database.

    Splitgraph :Splitgraph是用于构建,扩展,版本控制和共享SQL数据库的工具,该工具独特地擅长于增强现有工具。 Splitgraph还具有一个数据目录,其中包括可以与任何SQL客户端查询(和联接)的4万个开放数据集。 Splitgraph不是数据库。

  12. Datacoral: Datacoral is uniquely good at automatically generating data ingestion and transformation pipelines from SQL-based declarative specifications, and automatically capturing and displaying schema level lineage. Datacoral plays nice with data ingestion tools like Segment, and workflow management tools like Airflow. Datacoral is NOT a data warehouse or a query engine.

    Datacoral :Datacoral擅长于根据基于SQL的声明性规范自动生成数据提取和转换管道,以及自动捕获和显示架构级别的沿袭。 Datacoral可以与数据吸收工具(例如细分)和工作流管理工具(例如Airflow)配合使用。 Datacoral不是数据仓库或查询引擎。

  13. Apache Arrow: Apache Arrow is uniquely good as a language-independent standard for fast in-memory analytical processing and efficient interprocess transport (with minimal overhead) of large tabular datasets. While intended as a computational foundation for data frame projects, it is NOT a replacement for end-user facing tools like pandas.

    Apache Arrow :Apache Arrow作为独立于语言的标准非常出色,可用于大型表格数据集的快速内存内分析处理和高效的进程间传输(开销最小)。 虽然旨在作为数据框架项目的计算基础,但它并不能替代面向最终用户的工具(如熊猫)。

  14. Datasaur: Datasaur is built to support NLP labeling via ML-assisted suggestions. It supports workforce management, maintains data privacy, and can be integrated via API to any ML workflow. Datasaur does NOT handle bounding boxes for image/video labeling.

    Datasaur :Datasaur旨在通过ML辅助建议来支持NLP标记。 它支持劳动力管理,维护数据隐私,并且可以通过API集成到任何ML工作流程中。 Datasaur不处理图像/视频标签的边框。

  15. Datakin: Datakin is a DataOps solution that helps guarantee that data pipelines run without disruption and resulting data can be trusted. It does so by automatically discovering data lineage and providing tools to quickly identify and resolve issues. Datakin is NOT a data catalog nor does it replace any existing data infrastructure components (workflow orchestration, data processing, …).

    Datakin :Datakin是DataOps解决方案,可帮助确保数据管道运行不中断,并且可以信任生成的数据。 它通过自动发现数据沿袭并提供工具来快速识别和解决问题来做到这一点。 Datakin不是数据目录,也不代替任何现有的数据基础架构组件(工作流程编排,数据处理等)。

  16. ApertureData: ApertureData is a database for visual data like images, videos, feature vectors, and associated metadata like annotations. It natively supports complex searching and preprocessing operations over media objects, and integrates with cloud-based storage and ML frameworks like PyTorch/Tensorflow.. ApertureData does NOT extract metadata or features from images/videos.

    ApertureData :ApertureData是一个数据库,用于存储视觉数据,例如图像,视频,特征向量以及相关的元数据(例如注释)。 它本身支持对媒体对象的复杂搜索和预处理操作,并与基于云的存储和ML框架(如PyTorch / Tensorflow)集成。.ApertureData不会从图像/视频中提取元数据或特征。

  17. Orchest: Orchest is uniquely good at assisting data scientists in interactively building data science pipelines by providing a visual pipeline editing environment in the browser. Pipeline steps are containerized notebooks or scripts. Orchest does NOT replace Jupyter notebooks, provide a no-code tool, or bring its own computational infrastructure.

    Orchest :Orchest独特地擅长通过在浏览器中提供可视化的管道编辑环境来协助数据科学家以交互方式构建数据科学管道。 管道步骤是容器化的笔记本或脚本。 Orchest不会替换Jupyter笔记本,提供无代码工具或拥有自己的计算基础结构。

  18. Gazette: Gazette is an open source streaming platform that breaks down the divide between batch and real-time data, enabling users to build real-time applications with exactly-once semantics. It offers real-time message streams, which are natively and durably stored as regular files in cloud storage. Gazette is NOT an ETL tool or an analytics platform.

    Gazette :Gazette是一个开放源代码的流媒体平台,可打破批处理数据与实时数据之间的鸿沟,使用户能够使用一次精确的语义构建实时应用程序。 它提供了实时消息流,这些消息流作为常规文件以本地和持久方式存储在云存储中。 宪报不是ETL工具或分析平台。

  19. Coiled Computing: Coiled excels at scaling data science and machine learning workflows in native Python using Dask, which is familiar, widely adopted, and gives great feedback. Coiled is an opinionated way of bursting to clusters and the cloud while staying in the PyData ecosystem. Coiled/Dask is NOT a database or Kubernetes replacement.

    Coiled Computing :Coiled在使用达斯克(Dask)来扩展本地Python中的数据科学和机器学习工作流程方面表现出色,该工具已被熟悉,被广泛采用并提供了很好的反馈。 盘绕是一种固守在PyData生态系统中而突然爆发的集群和云方法。 Coiled / Dask不是数据库或Kubernetes的替代品。

  20. Upsolver: Upsolver is a cloud-native solution for integrating structured and unstructured data on cloud storage. It utilizes a visual, SQL interface for quick and easy data transformation. Upsolver is NOT a Platform as a Service solution that requires developers to write additional code and learn low-level concepts to process data.

    Upsolver :Upsolver是一种云原生解决方案,用于在云存储上集成结构化和非结构化数据。 它利用可视化SQL界面进行快速轻松的数据转换。 Upsolver并非平台即服务解决方案,它要求开发人员编写其他代码并学习低级概念来处理数据。

As authors (Sarah, Abe & Pete) we’re collectively brainstorming about how we can extend this effort and create an ever-growing list that helps practitioners find and adopt the right tools, founders align with the best partners, and investors map companies to their investment theses. We look forward to hearing your thoughts on the best medium to continue this exploration with the support of the community.

作为作者( Sarah , Abe和Pete ),我们正在集体商讨如何扩展这项工作并创建一个不断增长的清单,以帮助从业人员找到并采用正确的工具,创始人与最佳合作伙伴保持一致,以及投资者将公司定位于他们的投资论文。 我们期待听到您在最佳媒体上的想法,以便在社区的支持下继续进行这一探索。

翻译自: https://towardsdatascience.com/20-more-hot-data-tools-and-what-they-dont-do-46bc365bea74

多种数据库连接工具

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/392555.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

怎么连接 mysql_怎样连接连接数据库

这个博客是为了说明怎么连接数据库第一步:肯定是要下载数据库,本人用的SqlServer2008,是从别人的U盘中拷来的。第二步:数据库的登录方式设置为混合登录,步骤如下:1.打开数据库这是数据库界面,要…

webstorm环境安装配置(less+autoprefixer)

node安装: 参考地址:http://www.runoob.com/nodejs/nodejs-install-setup.html 1.下载node安装包并完成安装 2.在开始菜单打开node 3.查看是否安装完成(npm是node自带安装的) 命令:node -v npm -v less安装&#xff1a…

leetcode 659. 分割数组为连续子序列(贪心算法)

给你一个按升序排序的整数数组 num(可能包含重复数字),请你将它们分割成一个或多个子序列,其中每个子序列都由连续整数组成且长度至少为 3 。 如果可以完成上述分割,则返回 true ;否则,返回 fa…

将JAVA编译为EXE的几种方法

< DOCTYPE html PUBLIC -WCDTD XHTML StrictEN httpwwwworgTRxhtmlDTDxhtml-strictdtd> 将JAVA编译为EXE的几种方法 -------------------------------------------------------------------------------- 将Java应用程序本地编译为EXE的几种方法(建议使用JOVE和JET)  a.…

文本训练集_训练文本中的不稳定性

文本训练集介绍 (Introduction) In text generation, conventionally, maximum likelihood estimation is used to train a model to generate a text one token at a time. Each generated token will be compared against the ground-truth data. If any token is different …

山东省赛 传递闭包

https://vjudge.net/contest/311348#problem/A 思路&#xff1a;用floyd传递闭包处理点与点之间的关系&#xff0c;之后开数组记录每个数字比它大的个数和小的个数&#xff0c;如果这个个数超过n/2那么它不可能作为中位数&#xff0c;其他的都有可能。 #include<bits/stdc.h…

如何使用动态工具提示构建React Native图表

by Vikrant Negi通过Vikrant Negi 如何使用动态工具提示构建React Native图表 (How to build React Native charts with dynamic tooltips) Creating charts, be it on the web or on mobile apps, has always been an interesting and challenging task especially in React …

如何解决ajax跨域问题(转)

由 于此前很少写前端的代码(哈哈&#xff0c;不合格的程序员啊)&#xff0c;最近项目中用到json作为系统间交互的手段&#xff0c;自然就伴随着众多ajax请求&#xff0c;随之而来的就是要解决 ajax的跨域问题。本篇将讲述一个小白从遇到跨域不知道是跨域问题&#xff0c;到知道…

mysql并发错误_又谈php+mysql并发数据出错问题

最近&#xff0c;项目中的所有crond定时尽量取消&#xff0c;改成触发式。比如每日6点清理数据。原来的逻辑&#xff0c;写一个crond定时搞定现在改为触发式6点之后第一个玩家/用户 进入&#xff0c;才开始清理数据。出现了一个问题1 如何确保第一个玩家触发&#xff1f;updat…

leetcode 621. 任务调度器(贪心算法)

给你一个用字符数组 tasks 表示的 CPU 需要执行的任务列表。其中每个字母表示一种不同种类的任务。任务可以以任意顺序执行&#xff0c;并且每个任务都可以在 1 个单位时间内执行完。在任何一个单位时间&#xff0c;CPU 可以完成一个任务&#xff0c;或者处于待命状态。 然而&…

英国脑科学领域_来自英国A级算法崩溃的数据科学家的4课

英国脑科学领域In the UK, families, educators, and government officials are in an uproar about the effects of a new algorithm for scoring “A-levels,” the advanced level qualifications used to evaluate students’ knowledge of specific subjects in preparati…

MVC发布后项目存在于根目录中的子目录中时的css与js、图片路径问题

加载固定资源js与css <script src"Url.Content("~/Scripts/js/jquery.min.js")" type"text/javascript"></script> <link href"Url.Content("~/Content/css/shop.css")" rel"stylesheet" type&quo…

telegram 机器人_学习使用Python在Telegram中构建您的第一个机器人

telegram 机器人Imagine this, there is a message bot that will send you a random cute dog image whenever you want, sounds cool right? Let’s make one!想象一下&#xff0c;有一个消息机器人可以随时随地向您发送随机的可爱狗图像&#xff0c;听起来很酷吧&#xff1…

判断输入的字符串是否为回文_刷题之路(九)--判断数字是否回文

Palindrome Number问题简介&#xff1a;判断输入数字是否是回文,不是返回0,负数返回0举例:1:输入: 121输出: true2:输入: -121输出: false解释: 回文为121-&#xff0c;所以负数都不符合3:输入: 10输出: false解释: 倒序为01&#xff0c;不符合要求解法一&#xff1a;这道题比较…

python + selenium 搭建环境步骤

介绍在windows下&#xff0c;selenium python的安装以及配置。1、首先要下载必要的安装工具。 下载python&#xff0c;我安装的python3.0版本,根据你自己的需要安装下载setuptools下载pip(python的安装包管理工具) 配置系统的环境变量 python,需要配置2个环境变量C:\Users\AppD…

VirtualBox 虚拟机复制

本文简单讲两种情况下的复制方式 1 跨电脑复制 2 同一virtrul box下 虚拟机复制 ---------------------------------------------- 1 跨电脑复制 a虚拟机 是老的虚拟机 b虚拟机 是新的虚拟机 新虚拟机b 新建&#xff0c; 点击下一步会生成 相应的文件夹 找到老虚拟机a的 vdi 文…

javascript实用库_编写实用JavaScript的实用指南

javascript实用库by Nadeesha Cabral通过Nadeesha Cabral 编写实用JavaScript的实用指南 (A practical guide to writing more functional JavaScript) Functional programming is great. With the introduction of React, more and more JavaScript front-end code is being …

数据库数据过长避免_为什么要避免使用商业数据科学平台

数据库数据过长避免让我们从一个类比开始 (Lets start with an analogy) Stick with me, I promise it’s relevant.坚持下去&#xff0c;我保证这很重要。 If your selling vegetables in a grocery store your business value lies in your loyal customers and your positi…

mysql case快捷方法_MySQL case when使用方法实例解析

首先我们创建数据库表&#xff1a; CREATE TABLE t_demo (id int(32) NOT NULL,name varchar(255) DEFAULT NULL,age int(2) DEFAULT NULL,num int(3) DEFAULT NULL,PRIMARY KEY (id)) ENGINEInnoDB DEFAULT CHARSETutf8;插入数据&#xff1a;INSERT INTO t_demo VALUES (1, 张…

【~~~】POJ-1006

很简单的一道题目&#xff0c;但是引出了很多知识点。 这是一道中国剩余问题&#xff0c;先贴一下1006的代码。 #include "stdio.h" #define MAX 21252 int main() { int p , e , i , d , n 1 , days 0; while(1) { scanf("%d %d %d %d",&p,&e,&…