男女直接做的视频视频网站东莞免费自助建站模板

news/2025/9/22 20:58:19/文章来源:

男女直接做的视频视频网站,东莞免费自助建站模板,知名的环保行业网站开发,深圳网络推广培训机构专属领域论文订阅 VX关注晓理紫#xff0c;每日更新论文#xff0c;如感兴趣#xff0c;请转发给有需要的同学#xff0c;谢谢支持分类: 大语言模型LLM视觉模型VLM扩散模型视觉导航具身智能#xff0c;机器人强化学习开放词汇#xff0c;检测分割 [晓理紫]每日论文分享…专属领域论文订阅 VX关注晓理紫每日更新论文如感兴趣请转发给有需要的同学谢谢支持分类: 大语言模型LLM视觉模型VLM扩散模型视觉导航具身智能机器人强化学习开放词汇检测分割 [晓理紫]每日论文分享(有中文摘要源码或项目地址) Embodied Artificial Intelligence 标题: Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation 作者: Yuanchen Ju, Kaizhe Hu, Guowei Zhang 中文摘要: 实现机器人操作将其推广到分发之外的场景是朝着开放世界的智能化迈出的关键一步。对人类来说这种能力植根于对物体之间语义对应的理解从而将熟悉物体的互动体验自然地转移到新颖物体上。尽管机器人缺乏这样一个互动经验库但互联网上大量可用的人类视频可能是一种宝贵的资源我们可以从中提取包括接触点在内的可供性记忆。受人类自然思维方式的启发我们提出了Robo-ABC当面对需要泛化的陌生对象时机器人可以通过从可供性记忆中检索具有视觉或语义相似性的对象来获得可供性。下一步是将检索到的对象的接触点映射到新对象。虽然乍一看建立这种对应关系可能会带来巨大的挑战但最近的研究发现它自然源于预先训练的扩散模型即使在不同的对象类别之间也能进行可供性映射。通过Robo-ABC框架机器人可以在没有任何手动注释、额外训练、部分分割、预编码知识或视点限制的情况下以零样本的方式进行归纳以操作类别外对象。在数量上与最先进的SOTA端到端可供性模型相比Robo-ABC显著提高了视觉可供性检索的准确性31.6%。我们还进行了跨类别物体抓取任务的真实世界实验。Robo ABC的成功率为85.7%证明了其在现实世界任务中的能力摘要: Enabling robotic manipulation that generalizes to out-of-distribution scenes is a crucial step toward open-world embodied intelligence. For human beings, this ability is rooted in the understanding of semantic correspondence among objects, which naturally transfers the interaction experience of familiar objects to novel ones. Although robots lack such a reservoir of interaction experience, the vast availability of human videos on the Internet may serve as a valuable resource, from which we extract an affordance memory including the contact points. Inspired by the natural way humans think, we propose Robo-ABC: when confronted with unfamiliar objects that require generalization, the robot can acquire affordance by retrieving objects that share visual or semantic similarities from the affordance memory. The next step is to map the contact points of the retrieved objects to the new object. While establishing this correspondence may present formidable challenges at first glance, recent research finds it naturally arises from pre-trained diffusion models, enabling affordance mapping even across disparate object categories. Through the Robo-ABC framework, robots may generalize to manipulate out-of-category objects in a zero-shot manner without any manual annotation, additional training, part segmentation, pre-coded knowledge, or viewpoint restrictions. Quantitatively, Robo-ABC significantly enhances the accuracy of visual affordance retrieval by a large margin of 31.6% compared to state-of-the-art (SOTA) end-to-end affordance models. We also conduct real-world experiments of cross-category object-grasping tasks. Robo-ABC achieved a success rate of 85.7%, proving its capacity for real-world tasks. [Downlink:]http://arxiv.org/abs/2401.07487v1 标题: AMC24 A Novel Stiffness Modulation Mechanism for Energy Efficient Variable Stiffness Actuators 作者: Sariyildiz Emre 中文摘要: 本文提出了一种新的刚度调制机制该机制能够以快速的方式实现无限程刚度调制。所提出的刚度调制机制可以帮助改善许多机器人-环境交互应用如人机协作和机器人康复摘要: This paper presents a new stiffness modulation mechanism that enables infinite-range stiffness modulation in a fast manner. The proposed stiffness modulation mechanism can help improve many robot environment interaction applications such as human-robot collaboration and robotic rehabilitation. [Downlink:]http://arxiv.org/abs/2401.07430v1 Reinforcement Learning RL 标题: Contrastive Active Inference 作者: Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt 中文摘要: 主动推理是一种统一的感知和行动理论其基础是大脑通过最小化自由能量来维持世界的内部模型。从行为的角度来看主动推理机可以被视为自我证明的存在它们通过行动来实现自己的乐观预测即偏好的结果或目标。相比之下强化学习需要人为设计的奖励来实现任何期望的结果。尽管主动推理可以为控制提供一个更自然的自监督目标但由于在复杂环境中扩展该方法的缺点其适用性受到限制。在这项工作中我们提出了一个主动推理的对比目标该目标大大减少了学习智能体生成模型和规划未来行动的计算负担。在基于图像的任务中我们的方法明显优于基于似然的主动推理同时计算成本更低更容易训练。我们与能够访问人类设计的奖励函数的强化学习代理进行了比较表明我们的方法与它们的性能非常匹配。最后我们还表明在环境中存在干扰因素的情况下对比方法的表现明显更好并且我们的方法能够将目标概括为背景中的变化。网站和代码https://contrastive-aif.github.io/ 摘要: Active inference is a unifying theory for perception and action resting upon the idea that the brain maintains an internal model of the world by minimizing free energy. From a behavioral perspective, active inference agents can be seen as self-evidencing beings that act to fulfill their optimistic predictions, namely preferred outcomes or goals. In contrast, reinforcement learning requires human-designed rewards to accomplish any desired outcome. Although active inference could provide a more natural self-supervised objective for control, its applicability has been limited because of the shortcomings in scaling the approach to complex environments. In this work, we propose a contrastive objective for active inference that strongly reduces the computational burden in learning the agent’s generative model and planning future actions. Our method performs notably better than likelihood-based active inference in image-based tasks, while also being computationally cheaper and easier to train. We compare to reinforcement learning agents that have access to human-designed reward functions, showing that our approach closely matches their performance. Finally, we also show that contrastive methods perform significantly better in the case of distractors in the environment and that our method is able to generalize goals to variations in the background. Website and code: https://contrastive-aif.github.io/ [Downlink:]http://arxiv.org/abs/2110.10083v4 [Project:]https://contrastive-aif.github.io/| 标题: Pgx: Hardware-Accelerated Parallel Game Simulators for Reinforcement Learning 作者: Sotetsu Koyamada, Shinri Okano, Soichiro Nishimori 中文摘要: 我们提出了Pgx这是一套用JAX编写并针对GPU/TPU加速器进行优化的棋盘游戏强化学习RL环境。通过利用JAX在加速器上的自动矢量化和并行化Pgx可以有效地扩展到数千个加速器上的同时模拟。在DGX-A100工作站上的实验中我们发现Pgx模拟RL环境的速度比Python中现有的实现快10-100x。Pgx包括RL环境通常用作RL研究的基准如双陆棋、国际象棋、将棋和围棋。此外Pgx提供了微型游戏集和基线模型以促进快速的研究周期。我们展示了Gumbel AlphaZero算法在Pgx环境中的有效训练。总体而言Pgx为研究人员提供了高性能的环境模拟器以加速他们的RL实验。Pgx可在http://github.com/sotetsuk/pgx. 摘要: We propose Pgx, a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators. By leveraging JAX’s auto-vectorization and parallelization over accelerators, Pgx can efficiently scale to thousands of simultaneous simulations over accelerators. In our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate RL environments 10-100x faster than existing implementations available in Python. Pgx includes RL environments commonly used as benchmarks in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx offers miniature game sets and baseline models to facilitate rapid research cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm with Pgx environments. Overall, Pgx provides high-performance environment simulators for researchers to accelerate their RL experiments. Pgx is available at http://github.com/sotetsuk/pgx. [Downlink:]http://arxiv.org/abs/2303.17503v4 [GitHub:]http://github.com/sotetsuk/pgx.| 标题: Efficient Reinforcemen Learning with Decoupling Exploration and Utilization 作者: Jingpu Yang, Qirui Zhao, Helin Wang 中文摘要: 深度神经网络DNN泛化受到当前离线强化学习技术对现有数据集保守处理的过度依赖的限制。这种方法经常导致算法满足于仅调整到特定数据集的次优解决方案。同样在在线强化学习中先前强加的惩罚性悲观主义也剥夺了该模型的探索潜力。我们的研究提出了一个新的框架乐观和悲观的行动者强化学习OPARL。OPARL采用了一种独特的双重参与者方法一个乐观的参与者致力于探索另一个悲观的参与者专注于利用从而有效地区分了探索和利用策略。强化学习方法的这种独特组合促进了一种更平衡、更有效的方法。它能够优化政策通过悲观的利用策略专注于产生高回报的行动同时通过乐观的探索确保广泛的州覆盖率。实验和理论研究表明OPARL提高了代理人的应用和探索能力。在DMControl基准测试和Mujoco环境的大多数任务中OPARL的性能优于最先进的方法。我们的代码已于发布https://github.com/yydsok/OPARL 摘要: Deep neural network(DNN) generalization is limited by the over-reliance of current offline reinforcement learning techniques on conservative processing of existing datasets. This method frequently results in algorithms that settle for suboptimal solutions that only adjust to a certain dataset. Similarly, in online reinforcement learning, the previously imposed punitive pessimism also deprives the model of its exploratory potential. Our research proposes a novel framework, Optimistic and Pessimistic Actor Reinforcement Learning (OPARL). OPARL employs a unique dual-actor approach: an optimistic actor dedicated to exploration and a pessimistic actor focused on utilization, thereby effectively differentiating between exploration and utilization strategies. This unique combination in reinforcement learning methods fosters a more balanced and efficient approach. It enables the optimization of policies that focus on actions yielding high rewards through pessimistic utilization strategies, while also ensuring extensive state coverage via optimistic exploration. Experiments and theoretical study demonstrates OPARL improves agents’ capacities for application and exploration. In the most tasks of DMControl benchmark and Mujoco environment, OPARL performed better than state-of-the-art methods. Our code has released on https://github.com/yydsok/OPARL [Downlink:]http://arxiv.org/abs/2312.15965v2 [GitHub:]https://github.com/yydsok/OPARL| 标题: CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design 作者: Zeji Yi, Chaoyi Pan, Guanqi He 中文摘要: 基于采样的模型预测控制MPC由于其灵活性和并行性在许多领域都是一种实用而有效的方法尤其是基于模型的强化学习。尽管它具有吸引人的经验性能但理论上的理解特别是在收敛分析和超参数调整方面仍然缺乏。在本文中我们描述了一种广泛使用的基于采样的MPC方法——模型预测路径积分控制MPPI的收敛性。我们证明了当优化为二次优化时MPPI至少具有线性收敛速度该优化涵盖了时变LQR系统。然后我们扩展到更一般的非线性系统。我们的理论分析直接导致了一种新的基于采样的MPC算法即CoVariance Optimal MPCCoVo-MPC该算法优化调度采样协方差以优化收敛速度。根据经验CoVo-MPC在模拟和真实世界的四旋翼敏捷控制任务中都显著优于标准MPPI 43-54%。视频和附录位于\url{https://lecar-lab.github.io/CoVO-MPC/}. 摘要: Sampling-based Model Predictive Control (MPC) has been a practical and effective approach in many domains, notably model-based reinforcement learning, thanks to its flexibility and parallelizability. Despite its appealing empirical performance, the theoretical understanding, particularly in terms of convergence analysis and hyperparameter tuning, remains absent. In this paper, we characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI). We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems. We then extend to more general nonlinear systems. Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVariance-Optimal MPC (CoVo-MPC) that optimally schedules the sampling covariance to optimize the convergence rate. Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quadrotor agile control tasks. Videos and Appendices are available at \url{https://lecar-lab.github.io/CoVO-MPC/}. [Downlink:]http://arxiv.org/abs/2401.07369v1 [Project:]https://lecar-lab.github.io/CoVO-MPC/| 标题: Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization 作者: Kun Lei, Zhengmao He, Chenhao Lu 中文摘要: 结合线下和在线强化学习RL对于高效和安全的学习至关重要。然而以前的方法将离线和在线学习视为单独的过程导致了冗余的设计和有限的性能。我们问我们能在不引入额外的保守主义或规则化的情况下实现直接而有效的线下和在线学习吗在这项研究中我们提出了Uni-o4它利用了离线和在线学习的政策目标。由于两个阶段的目标一致RL代理可以在离线和在线学习之间无缝转移。这种特性增强了学习范式的灵活性允许预训练、微调、离线和在线学习的任意组合。特别是在离线阶段Uni-o4利用不同的集成策略来解决估计的行为策略和离线数据集之间的不匹配问题。通过简单的离线策略评估OPE方法Uni-o4可以安全地实现多步骤策略改进。我们证明通过采用上述方法这两种范式的融合可以产生优越的离线初始化以及稳定快速的在线微调能力。通过现实世界中的机器人任务我们强调了这种模式在具有挑战性的、以前从未见过的现实世界环境中快速部署的好处。此外通过使用大量模拟基准进行综合评估我们证实了我们的方法在离线和离线到在线的微调学习中都达到了最先进的性能。我们的网站https://lei-kun.github.io/uni-o4/. 摘要: Combining offline and online reinforcement learning (RL) is crucial for efficient and safe learning. However, previous approaches treat offline and online learning as separate procedures, resulting in redundant designs and limited performance. We ask: Can we achieve straightforward yet effective offline and online learning without introducing extra conservatism or regularization? In this study, we propose Uni-o4, which utilizes an on-policy objective for both offline and online learning. Owning to the alignment of objectives in two phases, the RL agent can transfer between offline and online learning seamlessly. This property enhances the flexibility of the learning paradigm, allowing for arbitrary combinations of pretraining, fine-tuning, offline, and online learning. In the offline phase, specifically, Uni-o4 leverages diverse ensemble policies to address the mismatch issues between the estimated behavior policy and the offline dataset. Through a simple offline policy evaluation (OPE) approach, Uni-o4 can achieve multi-step policy improvement safely. We demonstrate that by employing the method above, the fusion of these two paradigms can yield superior offline initialization as well as stable and rapid online fine-tuning capabilities. Through real-world robot tasks, we highlight the benefits of this paradigm for rapid deployment in challenging, previously unseen real-world environments. Additionally, through comprehensive evaluations using numerous simulated benchmarks, we substantiate that our method achieves state-of-the-art performance in both offline and offline-to-online fine-tuning learning. Our website: https://lei-kun.github.io/uni-o4/ . [Downlink:]http://arxiv.org/abs/2311.03351v3 [Project:]https://lei-kun.github.io/uni-o4/|https://lei-kun.github.io/uni-o4/| 标题: Learning Interactive Real-World Simulators 作者: Mengjiao Yang, Yilun Du, Kamyar Ghasemipour 中文摘要: 基于互联网数据训练的生成模型彻底改变了文本、图像和视频内容的创建方式。也许生成模型的下一个里程碑是模拟现实体验以响应人类、机器人和其他交互式代理所采取的行动。真实世界模拟器的应用范围从游戏和电影中的可控内容创建到纯粹在模拟中训练可直接部署在现实世界中的具体代理。我们探索了通过生成建模学习真实世界交互的通用模拟器的可能性。我们首先提出了一个重要的观察结果即可用于学习真实世界模拟器的自然数据集通常在不同维度上是丰富的例如图像数据中的大量对象、机器人数据中的密集采样动作以及导航数据中的不同运动。通过仔细编排不同的数据集每个数据集都提供了整体体验的不同方面我们可以从静态场景和对象中模拟高级指令如“打开抽屉”和低级控件如“按xy移动”的视觉结果。我们使用模拟器来训练高级视觉语言策略和低级强化学习策略在纯模拟训练后每一种策略都可以在现实世界中零次部署。我们还表明其他类型的智能如视频字幕模型可以从模拟经验的训练中受益从而开辟更广泛的应用。视频演示可在https://universal-simulator.github.io. 摘要: Generative models trained on internet data have revolutionized how text, image, and video content can be created. Perhaps the next milestone for generative models is to simulate realistic experience in response to actions taken by humans, robots, and other interactive agents. Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world. We explore the possibility of learning a universal simulator of real-world interaction through generative modeling. We first make the important observation that natural datasets available for learning a real-world simulator are often rich along different dimensions (e.g., abundant objects in image data, densely sampled actions in robotics data, and diverse movements in navigation data). With careful orchestration of diverse datasets, each providing a different aspect of the overall experience, we can simulate the visual outcome of both high-level instructions such as open the drawer’’ and low-level controls such as “move by x, y” from otherwise static scenes and objects. We use the simulator to train both high-level vision-language policies and low-level reinforcement learning policies, each of which can be deployed in the real world in zero shot after training purely in simulation. We also show that other types of intelligence such as video captioning models can benefit from training with simulated experience, opening up even wider applications. Video demos can be found at https://universal-simulator.github.io. [Downlink:]http://arxiv.org/abs/2310.06114v2 [Project:]https://universal-simulator.github.io.|https://universal-simulator.github.io| OOpen vocabulary detection 标题: SAMF: Small-Area-Aware Multi-focus Image Fusion for Object Detection 作者: Xilai Li, Xiaosong Li, Haishu Tan 中文摘要: 现有的多焦点图像融合MFIF方法往往无法准确地保留不确定的过渡区域和检测大散焦区域内的小焦点区域。为了解决这个问题本研究提出了一种新的小面积感知MFIF算法来增强目标检测能力。首先我们增强了小焦点和边界区域内的像素属性随后将其与视觉显著性检测相结合以获得用于区分聚焦像素分布的预融合结果。为了准确地确保像素聚焦我们将源图像视为聚焦、散焦和不确定区域的组合并提出了一种三区域分割策略。最后我们设计了一个有效的像素选择规则来生成分割决策图并获得最终的融合结果。实验表明该方法能够准确检测小而平滑的焦点区域同时提高了物体检测性能在主观和客观评价方面都优于现有方法。源代码位于https://github.com/ixilai/SAMF. 摘要: Existing multi-focus image fusion (MFIF) methods often fail to preserve the uncertain transition region and detect small focus areas within large defocused regions accurately. To address this issue, this study proposes a new small-area-aware MFIF algorithm for enhancing object detection capability. First, we enhance the pixel attributes within the small focus and boundary regions, which are subsequently combined with visual saliency detection to obtain the pre-fusion results used to discriminate the distribution of focused pixels. To accurately ensure pixel focus, we consider the source image as a combination of focused, defocused, and uncertain regions and propose a three-region segmentation strategy. Finally, we design an effective pixel selection rule to generate segmentation decision maps and obtain the final fusion results. Experiments demonstrated that the proposed method can accurately detect small and smooth focus areas while improving object detection performance, outperforming existing methods in both subjective and objective evaluations. The source code is available at https://github.com/ixilai/SAMF. [Downlink:]http://arxiv.org/abs/2401.08357v1 [GitHub:]https://github.com/ixilai/SAMF.| 标题: Generative Denoise Distillation: Simple Stochastic Noises Induce Efficient Knowledge Transfer for Dense Prediction 作者: Zhaoge Liu, Xiaohao Xu, Yunkang Cao 中文摘要: 知识提炼是将知识从更强大的大模型教师转移到更简单的模型学生的过程。目前的许多方法都涉及学生直接模仿老师的知识。然而通过这些流行的方法在学习的表示中仍然存在冗余这些方法倾向于不加区别地学习每个空间位置的特征。为了从教师那里获得更紧凑的表示概念特征受人类认知的启发我们提出了一种创新的方法称为生成去噪蒸馏GDD其中将随机噪声添加到学生的概念特征中以将其嵌入浅层网络生成的实例特征中。然后将生成的实例特征与来自老师的实例知识对齐。我们对对象检测、实例分割和语义分割进行了广泛的实验以证明我们的方法的通用性和有效性。值得注意的是GDD在上述任务中实现了最先进的性能。我们通过增强基于ResNet-18的PspNet和DeepLabV3在语义分割方面取得了实质性的改进mIoU得分分别为74.67和77.69超过了他们之前在20个类别的Cityscapes数据集上的69.85和73.20分。GDD的源代码可在https://github.com/ZhgLiu/GDD. 摘要: Knowledge distillation is the process of transferring knowledge from a more powerful large model (teacher) to a simpler counterpart (student). Numerous current approaches involve the student imitating the knowledge of the teacher directly. However, redundancy still exists in the learned representations through these prevalent methods, which tend to learn each spatial location’s features indiscriminately. To derive a more compact representation (concept feature) from the teacher, inspired by human cognition, we suggest an innovative method, termed Generative Denoise Distillation (GDD), where stochastic noises are added to the concept feature of the student to embed them into the generated instance feature from a shallow network. Then, the generated instance feature is aligned with the knowledge of the instance from the teacher. We extensively experiment with object detection, instance segmentation, and semantic segmentation to demonstrate the versatility and effectiveness of our method. Notably, GDD achieves new state-of-the-art performance in the tasks mentioned above. We have achieved substantial improvements in semantic segmentation by enhancing PspNet and DeepLabV3, both of which are based on ResNet-18, resulting in mIoU scores of 74.67 and 77.69, respectively, surpassing their previous scores of 69.85 and 73.20 on the Cityscapes dataset of 20 categories. The source code of GDD is available at https://github.com/ZhgLiu/GDD. [Downlink:]http://arxiv.org/abs/2401.08332v1 [GitHub:]https://github.com/ZhgLiu/GDD.| 标题: UV-SAM: Adapting Segment Anything Model for Urban Village Identification 作者: Xin Zhang, Yu Liu, Yuming Lin 中文摘要: 城中村被定义为城市中心内或周围的非正规住宅区其特点是基础设施不足和生活条件恶劣与关于贫困、适足住房和可持续城市的可持续发展目标密切相关。传统上政府在很大程度上依赖实地调查方法来监测城中村但这是耗时、劳动密集型的而且可能会延迟。得益于广泛可用和及时更新的卫星图像最近的研究开发了计算机视觉技术来有效地检测城市村庄。然而现有的研究要么侧重于简单的城中村图像分类要么未能提供准确的边界信息。为了从卫星图像中准确识别城中村边界我们利用视觉基础模型的力量将分段任意模型SAM应用于城中村分割称为UV-SAM。具体而言UV-SAM首先利用小型语义分割模型为城中村生成混合提示包括掩码、边界框和图像表示然后将其输入SAM进行细粒度边界识别。在中国两个数据集上的广泛实验结果表明UV-SAM优于现有基线多年的识别结果表明城中村的数量和面积都在随着时间的推移而减少这为深入了解城中村的发展趋势提供了更深入的见解并为可持续城市的愿景基础模型提供了启示。本研究的数据集和代码可在https://github.com/tsinghua-fib-lab/UV-SAM. 摘要: Urban villages, defined as informal residential areas in or around urban centers, are characterized by inadequate infrastructures and poor living conditions, closely related to the Sustainable Development Goals (SDGs) on poverty, adequate housing, and sustainable cities. Traditionally, governments heavily depend on field survey methods to monitor the urban villages, which however are time-consuming, labor-intensive, and possibly delayed. Thanks to widely available and timely updated satellite images, recent studies develop computer vision techniques to detect urban villages efficiently. However, existing studies either focus on simple urban village image classification or fail to provide accurate boundary information. To accurately identify urban village boundaries from satellite images, we harness the power of the vision foundation model and adapt the Segment Anything Model (SAM) to urban village segmentation, named UV-SAM. Specifically, UV-SAM first leverages a small-sized semantic segmentation model to produce mixed prompts for urban villages, including mask, bounding box, and image representations, which are then fed into SAM for fine-grained boundary identification. Extensive experimental results on two datasets in China demonstrate that UV-SAM outperforms existing baselines, and identification results over multiple years show that both the number and area of urban villages are decreasing over time, providing deeper insights into the development trends of urban villages and sheds light on the vision foundation models for sustainable cities. The dataset and codes of this study are available at https://github.com/tsinghua-fib-lab/UV-SAM. [Downlink:]http://arxiv.org/abs/2401.08083v1 [GitHub:]https://github.com/tsinghua-fib-lab/UV-SAM.| 标题: Calpric: Inclusive and Fine-grain Labeling of Privacy Policies with Crowdsourcing and Active Learning 作者: Wenjun Qiu, David Lie, Lisa Austin 中文摘要: 在隐私政策方面训练准确的深度学习模型的一个重大挑战是获得大量全面的训练数据的成本和难度。为了应对这些挑战我们提出了Calpric它结合了自动文本选择和分割、主动学习和众包注释器的使用以低成本生成一个大型、平衡的隐私政策培训集。自动文本选择和分割简化了标记任务使来自众包平台如亚马逊的Mechanical Turk的未经培训的注释器能够与受过培训的注释者如法律系学生竞争还减少了注释器之间的协议从而降低了标记成本。具有可靠的训练标签可以使用主动学习它使用更少的训练样本来有效地覆盖输入空间进一步降低成本并改善数据集中的类和数据类别平衡。这些技术的结合使Calpric能够生成在更广泛的数据类别上准确的模型并提供比以前更详细、更精细的标签。我们的众包流程使Calpric能够获得可靠的标记数据每个标记文本段的成本约为0.92至1.71美元。Calpric的训练过程还生成了一个标签数据集包含9个数据类别的16K隐私政策文本片段具有平衡的正样本和负样本摘要: A significant challenge to training accurate deep learning models on privacy policies is the cost and difficulty of obtaining a large and comprehensive set of training data. To address these challenges, we present Calpric , which combines automatic text selection and segmentation, active learning and the use of crowdsourced annotators to generate a large, balanced training set for privacy policies at low cost. Automated text selection and segmentation simplifies the labeling task, enabling untrained annotators from crowdsourcing platforms, like Amazon’s Mechanical Turk, to be competitive with trained annotators, such as law students, and also reduces inter-annotator agreement, which decreases labeling cost. Having reliable labels for training enables the use of active learning, which uses fewer training samples to efficiently cover the input space, further reducing cost and improving class and data category balance in the data set. The combination of these techniques allows Calpric to produce models that are accurate over a wider range of data categories, and provide more detailed, fine-grain labels than previous work. Our crowdsourcing process enables Calpric to attain reliable labeled data at a cost of roughly $0.92-$1.71 per labeled text segment. Calpric s training process also generates a labeled data set of 16K privacy policy text segments across 9 Data categories with balanced positive and negative samples. [Downlink:]http://arxiv.org/abs/2401.08038v1 [Project:]https://www.usenix.org/conference/usenixsecurity23/presentation/qiu| 标题: Machine Perceptual Quality: Evaluating the Impact of Severe Lossy Compression on Audio and Image Models 作者: Dan Jacobellis, Daniel Cummings, Neeraja J. Yadwadkar 中文摘要: 在神经数据压缩领域主要关注的是优化经典失真度量如PSNR或SSIM或人类感知质量的算法。随着机器而非人类消耗的数据量不断增加出现了一种面向机器的压缩新范式KaTeX parse error: Undefined control sequence: \unicode at position 1: \̲u̲n̲i̲c̲o̲d̲e̲x2013该范式将保留机器感知的显著特征置于传统的以人为中心的标准KaTeX parse error: Undefined control sequence: \unicode at position 1: \̲u̲n̲i̲c̲o̲d̲e̲{x2013}之上这给利用有损压缩的系统的开发、评估和部署带来了一些新的挑战。特别是目前尚不清楚有损压缩的不同方法将如何影响下游机器感知任务的性能。为了解决这一未被充分探索的领域我们评估了各种感知模型KaTeX parse error: Undefined control sequence: \unicode at position 1: \̲u̲n̲i̲c̲o̲d̲e̲{x2013}包括严重有损压缩下的图像分类、图像分割、语音识别和音乐源分离KaTeX parse error: Undefined control sequence: \unicode at position 1: \̲u̲n̲i̲c̲o̲d̲e̲{x2013}。我们使用了几种流行的编解码器它们跨越了传统、神经和生成压缩架构。我们的结果表明了三个关键发现1使用生成压缩可以利用高度压缩的数据同时对机器感知质量的影响可以忽略不计2 机器感知质量与深度相似性度量密切相关表明这些度量在面向机器的编解码器的开发中起着至关重要的作用和3使用有损压缩数据集如ImageNet进行预训练可能会导致有损压缩增加而不是降低机器感知质量的反直觉场景。为了鼓励参与这一不断增长的研究领域我们的代码和实验可在以下网站上获取https://github.com/danjacobellis/MPQ. 摘要: In the field of neural data compression, the prevailing focus has been on optimizing algorithms for either classical distortion metrics, such as PSNR or SSIM, or human perceptual quality. With increasing amounts of data consumed by machines rather than humans, a new paradigm of machine-oriented compressionKaTeX parse error: Undefined control sequence: \unicode at position 1: \̲u̲n̲i̲c̲o̲d̲e̲{x2013}which prioritizes the retention of features salient for machine perception over traditional human-centric criteriaKaTeX parse error: Undefined control sequence: \unicode at position 1: \̲u̲n̲i̲c̲o̲d̲e̲{x2013}has emerged, creating several new challenges to the development, evaluation, and deployment of systems utilizing lossy compression. In particular, it is unclear how different approaches to lossy compression will affect the performance of downstream machine perception tasks. To address this under-explored area, we evaluate various perception modelsKaTeX parse error: Undefined control sequence: \unicode at position 1: \̲u̲n̲i̲c̲o̲d̲e̲{x2013}including image classification, image segmentation, speech recognition, and music source separationKaTeX parse error: Undefined control sequence: \unicode at position 1: \̲u̲n̲i̲c̲o̲d̲e̲{x2013}under severe lossy compression. We utilize several popular codecs spanning conventional, neural, and generative compression architectures. Our results indicate three key findings: (1) using generative compression, it is feasible to leverage highly compressed data while incurring a negligible impact on machine perceptual quality; (2) machine perceptual quality correlates strongly with deep similarity metrics, indicating a crucial role of these metrics in the development of machine-oriented codecs; and (3) using lossy compressed datasets, (e.g. ImageNet) for pre-training can lead to counter-intuitive scenarios where lossy compression increases machine perceptual quality rather than degrading it. To encourage engagement on this growing area of research, our code and experiments are available at: https://github.com/danjacobellis/MPQ. [Downlink:]http://arxiv.org/abs/2401.07957v1 [GitHub:]https://github.com/danjacobellis/MPQ.| 标题: SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation 作者: Zhengze Xu, Dongyue Wu, Changqian Yu 中文摘要: 最近的实时语义分割方法通常采用额外的语义分支来追求丰富的长程上下文。然而额外的分支会导致不希望的计算开销并降低推理速度。为了消除这种困境我们提出了SCTNet这是一种具有转换器语义信息的单分支CNN用于实时分割。SCTNet享有无推理语义分支的丰富语义表示同时保留了轻量级单分支CNN的高效性。SCTNet利用transformer作为唯一的训练语义分支考虑到其提取长程上下文的卓越能力。借助所提出的类似CNN块CFBlock的转换器和语义信息对齐模块SCTNet可以在训练中从转换器分支捕获丰富的语义信息。在推理过程中只需要部署单个分支CNN。我们在Cityscapes、ADE20K和COCO-Stuff-10K上进行了广泛的实验结果表明我们的方法达到了最先进的性能。代码和型号可在https://github.com/xzz777/SCTNet 摘要: Recent real-time semantic segmentation methods usually adopt an additional semantic branch to pursue rich long-range context. However, the additional branch incurs undesirable computational overhead and slows inference speed. To eliminate this dilemma, we propose SCTNet, a single branch CNN with transformer semantic information for real-time segmentation. SCTNet enjoys the rich semantic representations of an inference-free semantic branch while retaining the high efficiency of lightweight single branch CNN. SCTNet utilizes a transformer as the training-only semantic branch considering its superb ability to extract long-range context. With the help of the proposed transformer-like CNN block CFBlock and the semantic information alignment module, SCTNet could capture the rich semantic information from the transformer branch in training. During the inference, only the single branch CNN needs to be deployed. We conduct extensive experiments on Cityscapes, ADE20K, and COCO-Stuff-10K, and the results show that our method achieves the new state-of-the-art performance. The code and model is available at https://github.com/xzz777/SCTNet [Downlink:]http://arxiv.org/abs/2312.17071v2 [GitHub:]https://github.com/xzz777/SCTNet|https://github.com/xzz777/SCTNet|

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/910350.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！