【读论文】【泛读】三篇生成式自动驾驶场景生成: Bevstreet, DisCoScene, BerfScene

文章目录

  • 1. Street-View Image Generation from a Bird’s-Eye View Layout
    • 1.1 Problem introduction
    • 1.2 Why
    • 1.3 How
    • 1.4 My takeaway
  • 2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis
    • 2.1 What
    • 2.2 Why
    • 2.3 How
    • 2.4 My takeaway
  • 3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)
    • 3.1 What
    • 3.2 Why
    • 3.3 How
    • 3.4 My takeaway

1. Street-View Image Generation from a Bird’s-Eye View Layout

1.1 Problem introduction

From the title of this paper, we know it bound a relation from Bev(Bird’s-Eye View) to Street view image.

在这里插入图片描述

Concretely, the input (Bev) is a two-dimensional representation of a three-dimensional environment from a top perspective. In the BEV diagram, squares of different colors represent different objects or road features, such as vehicles, pedestrians, lane lines, etc. And green square means an ego vehicle that has three cameras in front.

The task is to generate three street-view images aligned to the Bev according to the relative position among these square objects.

As for the concept of “layout”, it should consider the effects of these factors:

  • Cameras with an overlapping field-of-view (FoV) must ensure overlapping content is correctly shown
  • The visual styling of the scene also needs to be consistent such that all virtual views appear to be created in the same geographical area (e.g., urban vs. rural), at the same time of day, with the same weather conditions, and so on.
  • In addition to this consistency, the images must correspond to the HD
    map, faithfully reproducing the specified road layout, lane lines, and vehicle locations.

1.2 Why

It is the first attempt to explore the generative side of BEV perception for driving scenes.

1.3 How

  1. Methods
    在这里插入图片描述
    As shown in this pipeline, the Bev layout and source images were encoded as an input of the autoregressive transformer collaborating with direction and camera information to help the understanding of space. New mv-images were output.

  2. Experiments
    Three metrics are used.
    在这里插入图片描述
    FID represents the diversity and quality of generated images. Road mIoU and Vehicle mIoU can be used to represent the overlapping to verify the relative position in the Bev inputs.
    Scene edit was achieved by the change of Bev layout:
    在这里插入图片描述

1.4 My takeaway

  1. How to utilize the ability of an autoregressive transformer!!! Why do we use it other than others?
  2. I have known about what is Bev.

2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

2.1 What

An editable 3D generative model using object bounding boxes without semantic annotation as layout prior, allowing for high-quality scene synthesis and flexible user control of both the camera and scene objects.

2.2 Why

  • Existing generative models focus on individual objects, lacking the ability to handle non-trivial scenes.

  • Some works like GSN can only generate scenes, without object-level editing. That is because of the lack of explicit object definition in NeRF.

  • GIRAFFE explicitly composites object-centric radiance fields to support object-level control. Yet, it works poorly on mixed scenes due to the absence of proper spatial priors.

  • Interesting refer:

    17: Layout-transformer: Layout generation and completion with self-attention.

    26: Layout-gan: Generating graphic layouts with wireframe discriminators.

    58: Blockplanner: City block generation with vectorized graph representation.

2.3 How

在这里插入图片描述

Bounding boxes as layout priors to generate the objects, combined with the generated background were used in neural rendering. Meanwhile, an extra object discriminator for local discrimination is added, leading to better object-level supervision.

2.4 My takeaway

  1. Is it possible to cancel the manually marked bbox and automatically identify and regenerate the corresponding area in Gaussian?

3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)

3.1 What

Incorporating an equivariant radiance field with the guidance of a BEV map, this method allows us to produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.

Understood as the superposition of patches in a bev:
在ddd

3.2 Why

  1. Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis techniques since 3D scenes usually hold complex spatial configurations and consist of many objects at varying scales.
  2. Previous approaches often relied on scene graphs, facing limitations in processing due to unstructured topology.
  3. DiscoScene introduces complexity in interpreting the entire scene and
    faces scalability challenges when using Bbox.
  4. BEV maps could specify the composition and scales of objects clearly but lack insights into the detailed visual appearance of the objects. Recent attempts like InfiniCity and SceneDreamer try to avoid the ambiguity of BEV maps, but they are inefficiency.

3.3 How

在这里插入图片描述

To integrate the prior information provided by the BEV map into the radiation field, the researchers introduced a generator U U U, which can generate a 2D feature map based on BEV map conditions. Builder U U U adopts a network structure that combines U-Net architecture and StyleGAN blocks.

3.4 My takeaway

  1. Confused about how to use this U-Net, need some other time to supplement background knowledge. 🤡

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/822165.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

LabVIEW变速箱自动测试系统

LabVIEW变速箱自动测试系统 在农业生产中,采棉机作为重要的农用机械,其高效稳定的运行对提高采棉效率具有重要意义。然而,传统的采棉机变速箱测试方法存在测试效率低、成本高、对设备可能产生损害等问题。为了解决这些问题,开发了…

HTTP协议安全传输教程

HTTP协议有多个版本,包括但不限于HTTP/0.9、HTTP/1.0、HTTP/1.1、HTTP/2和HTTP/3。这些版本各自具有不同的特点和改进,以适应网络技术的发展和满足不同的需求。例如,HTTP/1.0使用文本格式传输数据,简单易用且兼容性好,…

十大排序——11.十大排序的比较汇总及Java中自带的排序算法

这篇文章对排序算法进行一个汇总比较! 目录 0.十大排序汇总 0.1概述 0.2比较和非比较的区别 0.3基本术语 0.4排序算法的复杂度及稳定性 1.冒泡排序 算法简介 动图演示 代码演示 应用场景 算法分析 2.快速排序 算法简介 动图演示 代码演示 应用场景…

Linux LVM与磁盘配额

目录 一.LVM概述 LVM LVM机制的基本概念 PV(Physical Volume,物理卷) VG(Volume Group,卷组) LV(Logical Volume,逻辑卷) 二.LVM 的管理命令 三.创建并使用LVM …

04-12 周五基于VS code + Python实现CSDN发布文章的自动生成

简介 之前曾经说过,在撰写文章之后,需要,同样需要将外链的图像转换为的形式,因此,可以参考 04-12 周五 基于VS Code Python 实现单词的自动提取 配置步骤 配置task 在vscode的命令面板configure task。配置如下的任…

UbuntuServer22.04安装docker

通过ubuntuserver安装docker是搭建开发环境最便捷的方式之一。下面介绍一下再ubuntu22.04上如何安装docker。相关内容参考官网链接:Install Docker Engine on Ubuntu 根据官网推荐,利用apt命令的方式安装,首先需要设置docker仓库&#xff0c…

【Android AMS】startActivity流程分析

文章目录 AMSActivityStackstartActivity流程startActivityMayWaitstartActivityUncheckedLocked startActivityLocked(ActivityRecord r, boolean newTask, boolean doResume, boolean keepCurTransition)resumeTopActivityLocked 参考 AMS是个用于管理Activity和其它组件运行…

贴片滚珠振动开关 / 振动传感器的用法

就是这种小东西: 上面的截图来自:https://item.szlcsc.com/3600130.html 以前写过一篇介绍这种东西内部的结构原理:贴片微型滚珠振动开关的结构原理。就是有个小滚珠会接通开关两边的电极,振动时滚珠会在内部蹦跳,开关…

手把手教你设计报表,轻松做出一份美观又实用的报表

你是不是在看着自己的呆板、没有特色的报表而深感苦恼,但同事却可以使用同样的数据制作并且展现出多样化的丰富美观的报表。要知道,除了要达成数据的准确度,基本的数据分析维度需求之外,报表的美观程度也有众多的隐藏“福利”与好…

自定义类似微信效果Preference

1. 为自定义Preference 添加背景&#xff1a;custom_preference_background.xml <?xml version"1.0" encoding"utf-8"?> <selector xmlns:android"http://schemas.android.com/apk/res/android"><item><shape android:s…

LRTimelapse for Mac:专业延时摄影视频制作利器

LRTimelapse for Mac是一款专为Mac用户设计的延时摄影视频制作软件&#xff0c;它以其出色的性能和丰富的功能&#xff0c;成为摄影爱好者和专业摄影师的得力助手。 LRTimelapse for Mac v6.5.4中文激活版下载 这款软件提供了直观易用的界面&#xff0c;用户可以轻松上手&#…

【从浅学到熟知Linux】进程控制下篇=>进程程序替换与简易Shell实现(含替换原理、execve、execvp等接口详解)

&#x1f3e0;关于专栏&#xff1a;Linux的浅学到熟知专栏用于记录Linux系统编程、网络编程等内容。 &#x1f3af;每天努力一点点&#xff0c;技术变化看得见 文章目录 进程程序替换什么是程序替换及其原理替换函数execlexeclpexecleexecvexecvpexecvpeexecve 替换函数总结实现…

宝剑锋从磨砺出,透视雀巢咖啡品牌焕新与产品升级的想象力

自1989年进入中国市场以来&#xff0c;陪伴着国内咖啡行业由启蒙期走向兴盛期的雀巢咖啡&#xff0c;始终坚持以消费者高品质、个性化需求为本位&#xff0c;在保有独特性的基础上持续创新&#xff0c;实现了从无到有的攻克与突破。 近日&#xff0c;深耕中国三十六载的雀巢咖…

康耐视visionpro-CogCreateLinePerpendicularTool操作操作工具详细说明

CogCreateLinePerpendicularTool]功能说明&#xff1a; 创建点到线的垂线 CogCreateLinePerpendicularTool操作说明&#xff1a; ①.打开工具栏&#xff0c;双击或点击扇标拖拽添加CogCreateLinePerpendicularTool ②.添加输入源&#xff1a;右键“链接到”或以连线拖 拽的方式…

Java代码基础算法练习-圆的面积-2024.04.17

任务描述&#xff1a; 已知半径r&#xff0c;求一个圆的面积(保留两位小数)&#xff0c;其中 0 < r < 5&#xff0c;PI 3.14&#xff0c;圆面积公式: PI * r * r 任务要求&#xff1a; 代码示例&#xff1a; package April_2024;import java.util.Scanner;// 已知半径…

【前端】1. HTML【万字长文】

HTML 基础 HTML 结构 认识 HTML 标签 HTML 代码是由 “标签” 构成的. 形如: <body>hello</body>标签名 (body) 放到 < > 中大部分标签成对出现. <body> 为开始标签, </body> 为结束标签.少数标签只有开始标签, 称为 “单标签”.开始标签和…

Linux-时间同步服务器

1. (问答题) 一.配置server主机要求如下&#xff1a; 1.server主机的主机名称为 ntp_server.example.com 编写脚本文件 #!/bin/bash hostnamectl hostname ntp_server.example.com cd /etc/NetworkManager/system-connections/ rm -fr * cat > eth0.nmconnection <&…

【编译原理】02词法分析(1)

接上篇 &#xff1a;【编译原理】01引论 词法分析是编译过程中将字符流转换成为符号流的一个工作阶段&#xff0c;是编译的第一步工作&#xff0c;其后续工作是语法分析。 词法分析的输入是源代码&#xff1b; 词法分析的输出是符号流&#xff1b; 词法分析需要识别词法错误&am…

STM32 软件I2C方式读取MT6701磁编码器获取角度例程

STM32 软件I2C方式读取MT6701磁编码器获取角度例程 &#x1f4cd;相关篇《STM32 软件I2C方式读取AS5600磁编码器获取角度例程》&#x1f33f;《Arduino通过I2C驱动MT6701磁编码器并读取角度数据》&#x1f530;MT6701芯片和AS5600从软件读取对比&#xff0c;只是读取的寄存器和…

代码随想录算法训练营第56天| 583. 两个字符串的删除操作|72. 编辑距离|编辑距离总结篇

代码随想录算法训练营第56天| 583. 两个字符串的删除操作|72. 编辑距离|编辑距离总结篇 详细布置 583. 两个字符串的删除操作 本题和动态规划&#xff1a;115.不同的子序列 相比&#xff0c;其实就是两个字符串都可以删除了&#xff0c;情况虽说复杂一些&#xff0c;但整体思…