• Home
  • All Posts
  • Tags
  • About
  • Atom feed
Yijun Huang's Homepage

artificial intelligence

Who owns the copyright for an AI generated creative work? April 20, 2021 4 minute read

Recently I was reading an article about a cool project that intends to have a neural network create songs of the late club of the 27 (artists that have tragically died at age 27 or near, and in the height of their respective careers), artists such as Amy Winehouse, Jimmy Hendrix, Curt Cobain and Jim Morrison.

The project was created by Over the Bridge, an organization dedicated to increase awareness on mental health and substance abuse in the music industry, trying to denormalize... read more

So, what is a neural network? April 2, 2021 9 minute read

The omnipresence of technology nowadays has made it commonplace to read news about AI, just a quick glance at today’s headlines, and I get:

  • This Powerful AI Technique Led to Clashes at Google and Fierce Debate in Tech.
  • How A.I.-powered companies dodged the worst damage from COVID
  • AI technology detects ‘ticking time bomb’ arteries
  • AI in Drug Discovery Starts to Live Up to the Hype
  • Pentagon seeks commercial solutions to get its data ready for AI

Topics from business, manufacturing, supply chain, medicine and biotech and even defense are covered in those news... read more

Deep Q Learning for Tic Tac Toe March 18, 2021 12 minute read

Background

After many years of a corporate career (17) diverging from computer science, I have now decided to learn Machine Learning and in the process return to coding (something I have always loved!).

To fully grasp the essence of ML I decided to start by coding a ML library myself, so I can fully understand the inner workings, linear algebra and calculus involved in Stochastic Gradient Descent. And on top learn Python (I used to code in C++ 20 years ago).

I built a general purpose basic ML library that... read more

cesium ion

On the trend towards the digital twin February 8, 2023 7 minute read

From Nature Comments by Tao

Digital twins — precise, virtual copies of machines or systems — are revolutionizing industry. Driven by data collected from sensors in real time, these sophisticated computer models mirror almost every facet of a product, process or service. Many major companies already use digital twins to spot problems and increase efficiency1. Half of all corporations might be using them by 2021, one analyst predicts.

For instance, NASA uses digital copies to monitor the status of its spacecraft. Energy companies General Electric (GE) and Chevron use them to track the operations of wind turbines. Singapore... read more

coding

On the trend towards the digital twin February 8, 2023 7 minute read

From Nature Comments by Tao

Digital twins — precise, virtual copies of machines or systems — are revolutionizing industry. Driven by data collected from sensors in real time, these sophisticated computer models mirror almost every facet of a product, process or service. Many major companies already use digital twins to spot problems and increase efficiency1. Half of all corporations might be using them by 2021, one analyst predicts.

For instance, NASA uses digital copies to monitor the status of its spacecraft. Energy companies General Electric (GE) and Chevron use them to track the operations of wind turbines. Singapore... read more

A brief review for deep Visual Odometry since 2016 July 20, 2022 22 minute read

Application of Deep Learning in Visual Odometry: A Brief Literature Review

Abstract

—Visual odometry is a technique for estimating camera egomotion based on continuous frame images and has important applications in areas such as UAV navigation and augmented reality. Traditional visual odometry mainly applies geometry-based methods, enabling near real-time applications on drones and robots. However, the classical methods have limited applications in challenging cases due to problems such as sensitivity to scene illumination and difficulty in detecting dynamic environments. With the booming development of deep learning in recent years, related techniques combined with visual odometry have emerged as... read more

A systematic overview for Visual Odometry of VSLAM June 30, 2022 5 minute read

Visual Odometry技术 (Of VSLAM)

[toc]

什么是SLAM

​ SLAM是Simultaneous localization and mapping缩写,意为“同步定位与建图”1。它是指搭载了特定传感器的主体,如机器人或者无人机等,在没有关于环境的先验知识的情况下,在运动的过程中建立环境的模型。SLAM的概念早在1986年2就提出了。然而,早期的SLAM往往依赖价格昂贵或专门定制的传感器,例如激光雷达,声呐或立体相机,这项技术并未走入市场。随着算法和算力的不断发展,廉价的相机逐渐成为一种取代激光雷达等复杂设备进行SLAM的可能 。那么,如果涉及到的传感器主要为相机,那么就称为“视觉(Visual)SLAM”, 也就是标题所叙述的、这篇博客主要探讨的VSLAM。

经典视觉SLAM框架

如图1所示,下面是经典视觉SLAM框架的主要组成结构:一个SLAM系统主要由Visual Odometry(视觉里程计,VO),Optimization(后端优化),Loop Closing(回环检测), Mapping(建图)这四个部分组成。

framework

图1 经典视觉SLAM框架

其中,VO能够通过相邻帧间的图像估计相机运动,并回复场景的空间结构。然而,仅仅有VO是不够的。VO其实就有点像马尔科夫链(Markov Chain)那样,只关注当前状态和未来状态的基本联系,不具有记忆性。这样一来,VO由于只有🐟的记忆,而每次的估计又会有一定偏差,每次估计的相机位姿运动偏差在机器人或者无人机运动过程中不断累加,形成累计偏移(Accumulating Drift)这些累计偏移有的时候会带来极为糟糕的后果。

如图2所示,设想一下,如果在估计的时候认为相机顺时针运动了90度,而实际上相机仅仅运动了89度,这样一来在一个空的矩形的房间里所作出的定位可能会因此不断远离相机的实际位置,而建立出来的地图也很可能会无法封闭。

OIP-C

图2 逐渐增大的偏移和无法闭合的地图

因此,我们需要在一个更宏观的视角下审视并且修正这些偏移。这样就引入了后端优化和回环检测这两个部分。

在后端优化中,则需要考虑相对更加长远的目标:在解决“如何从图像估计相机运动”的基础上从带有噪声的数据中估计整个系统的状态,以及这个状态估计的不确定性有多大,同时使得得到的相机位姿在全局上尽可能保持一致。相比于VO部分,Optimization往往没有那么可见,面对的只有数据,而不必关心数据的来自于激光雷达还是单目相机,又在视觉里程计之后,因此叫做后端。但是还是要注意的是,这里的前后端和web应用(如J2EE中的Servlet)中的前后端有所不同,要分清楚两者之间的区别。

在回环检测部分,最主要判断的还是机器人或者无人机是否达到之前到过的位置,如果检测到了闭环(往往是通过图像相似度判断实现),就会把信息提供给后端进行处理来得到一个全局一致的估计。用一个不太准确但是我自己觉得非常形象的比喻来说,这个过程就像是用把一个个用小棍子(预测的轨迹)穿起来的珠子(估计的点)头尾相接到一起保持中间各个珠子距离不变一样。现阶段应用最广的回环检测方法是词袋模型(Bag-of-Words),之后会详细介绍。

下面将对VSLAM框架中的VO技术进行具体介绍。

Visual Odometry

如上所述,Visual Odometry主要是计算图像帧之间 的相机位姿关系,也即通过拍摄图像,估计出相机的运行位置和姿态信息。根据所使用相机的类型,我们可以把VO分为单目VO和立体VO3,其中单目VO主要使用单目相机来获取环境的2D信息;而立体VO如RGB-D相机和双目相机在获取画面外能够直接通过结构光或者ToF获取场景深度信息或通过计算获得的场景深度信息(类似人眼)

在实际操作中,由于RGB-D相机由于很容易受到自然光的干扰,同时对于噪声的鲁棒性较差,本身价格也比较高不利于推广,因此主要用于室内SLAM;而双目相机的精度和深度方向上的量程受到基线长度,也即两个相机间距离的影响(但是做的宽一个是容易形变导致误差,一个是相机太宽影响运动),同时disparity map的计算要消耗大量的资源,往往需要GPU或者FPGA来加速,在深度上的测量很难达到令人满意的效果。

因此,单目相机SLAM技术便是这篇博客所要探讨的主要内容。在实践中,VO算法主要分为特征点法和直接法两类。

特征点法是通过汇总图像中所有有代表性的点的移动来预测相机的整体移动情况。由于通过矩阵在整个图像的层面来判断运动是十分困难的(LK光流需要强假设),因此我们可以用另一种图像的表现形式,也就是图像的特征来描述图片,减少不必要的信息(特征也可以看作是图像的主成分)。尽管特征点在面对墙体或者其他角点不显著(salient)的区域时可能难以识别4,但是在绝大多数场景下都能够找到充足的特征点来对帧间运动做出一个大致的估计。

传统的寻找特征点的方法主要包括Harris角点(参考BUAASE_CV_hw_set2)、FAST角点5等,这些经典的角点识别算法提出的时间较早,在图像变化幅度较大的情况下不够稳定。近年来不断发展的局部特征识别往往不仅匹配角点(或者也可以说兴趣点)本身,还会为角点提供相应的描述子(descriptor)来说明特征点的朝向和大小等信息。例如,SIFT就是一个十分经典的算法,能够对关照、尺度以及旋转都有很好的鲁棒性。然而,随着识别效果而来的还有巨大的计算量。与SfM不同,SLAM要求实时性,因此在课上熟知的SIFT很少被应用到SLAM的实际应用中。

那么有没有什么能够协调好准确率、鲁棒性以及计算量,使之能够适配SLAM的算法呢?当然有!这就是在SLAM中大名鼎鼎的ORB-SLAM,如图所示,就像YOLO一样,ORB也更新了很多版,证明了其强大的生命力。

image-20220630133322628

图3 orb各个版本的论文(图源本人)

ORB(Oriented FAST and Rotated BRIEF),是目前最快速稳定的特征点检测和提取算法,许多图像拼接和目标追踪技术利用ORB特征进行实现6。ORB-SLAM... read more

copyright

Who owns the copyright for an AI generated creative work? April 20, 2021 4 minute read

Recently I was reading an article about a cool project that intends to have a neural network create songs of the late club of the 27 (artists that have tragically died at age 27 or near, and in the height of their respective careers), artists such as Amy Winehouse, Jimmy Hendrix, Curt Cobain and Jim Morrison.

The project was created by Over the Bridge, an organization dedicated to increase awareness on mental health and substance abuse in the music industry, trying to denormalize... read more

creativity

Who owns the copyright for an AI generated creative work? April 20, 2021 4 minute read

Recently I was reading an article about a cool project that intends to have a neural network create songs of the late club of the 27 (artists that have tragically died at age 27 or near, and in the height of their respective careers), artists such as Amy Winehouse, Jimmy Hendrix, Curt Cobain and Jim Morrison.

The project was created by Over the Bridge, an organization dedicated to increase awareness on mental health and substance abuse in the music industry, trying to denormalize... read more

database

Database Review 1 March 3, 2022 less than 1 minute read

DBRiview-1 数据模型

概念模型及其作用

from wiki:The conceptual level unifies the various external views into a compatible global view.[36] It provides the synthesis of all the external views. It is out of the scope of the various database end-users, and is rather of interest to database application developers and database administrators.

概念模型实际上是现实世界到机器世界的一个中间层次。概念模型用于信息世界的建模,是现实世界到信息世界的第一层抽象,是数据库设计人员进行数据库设计的有力工具,也是数据库设计人员和用户之间进行交流的语言。概念模型常用E-R图表示,之后会给出实例

实体:

客观存在并可相互区别的事物称为实体。实体可以是具体的人、事、物·也可以是抽象的概念或联系,例如,一个职工、一个学生、一个部门、一门课、学生的一次选课、部 门的一次订货、教师与院系的工作关系(即某位教师在某院系工作)等都是实体。

实体型:

具有相同属性的实体必然具有共同的特征和性质。用实体名及其属性名集合来抽象和刻画同类实体,称为实体型。例如,学生(学号,姓名,性别,出生年月,所在院系,入 学时间)就是一个实体型。

实体集:

​ 同一类型实体的集合称为实体集。例如,全体学生就是一个实体集。学生作为类别是实体型

实体之间的联系:

在现实世界中,事物内部以及事物之间是有联系的,这些联系在信息世界中反映为实体(型)内部的联系和实体(型)之间的联系。实体内部的联系通常是指组成实体的各属 性之间的联系,实体之间的联系通常是指不同实体集之间的联系。 实体之间的联系有一对一、一对多和多对多等多种类型。

概念模型实例:

  1. 学校中有若干系,每个系有若干班级和教研室,每个教研室有若干教员,其中有的教授和副教授 每人各带若干研究生,每个班有若干学生,每个学生选修若干课程,每门课可由若干学生选修。请用E-R 图画出此学校的概念模型。

    q1

  2. 某工厂生产若干产品,每种产品由不同的零件组成,有的零件可用在不同的产品上。这些零件 由不同的原材料制成,不同零件所用的材料可以相同。这些零件按所属的不同产品分别放在仓库中,... read more

Database Review 0 March 3, 2022 2 minute read

DBRiview-0 数据、数据库

数据、数据库、数据库管理系统、数据库系统

from wiki:In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues including supporting concurrent access and fault tolerance.

A database management system (DBMS) is the read more

deep Neural networks

Neural Network Optimization Methods and Algorithms March 12, 2021 8 minute read

For the seemingly small project I undertook of creating a machine learning neural network that could learn by itself to play tic-tac-toe, I bumped into the necesity of implementing at least one momentum algorithm for the optimization of the network during backpropagation.

And since my original post for the TicTacToe project is quite large already, I decided to post separately these optimization methods and how did I implement them in my code.

Adam

source

Adaptive Moment Estimation (Adam) is an optimization method that computes adaptive learning rates for each weight and bias. In addition to storing an... read more

digital twin

On the trend towards the digital twin February 8, 2023 7 minute read

From Nature Comments by Tao

Digital twins — precise, virtual copies of machines or systems — are revolutionizing industry. Driven by data collected from sensors in real time, these sophisticated computer models mirror almost every facet of a product, process or service. Many major companies already use digital twins to spot problems and increase efficiency1. Half of all corporations might be using them by 2021, one analyst predicts.

For instance, NASA uses digital copies to monitor the status of its spacecraft. Energy companies General Electric (GE) and Chevron use them to track the operations of wind turbines. Singapore... read more

general blogging

Starting the adventure March 24, 2021 10 minute read

In the midst of a global pandemic caused by the SARS-COV2 coronavirus; I decided to start blogging. I wanted to blog since a long time, I have always enjoyed writing, but many unknowns and having “no time” for it prevented me from taking it up. Things like: “I don’t really know who my target audience is”, “what would my topic or topics be?”, “I don’t think I am a world-class expert in anything”, and many more kept stopping me from setting up my own blog. Now seemed like a good time as any so with those and tons of other... read more

life

Starting the adventure March 24, 2021 10 minute read

In the midst of a global pandemic caused by the SARS-COV2 coronavirus; I decided to start blogging. I wanted to blog since a long time, I have always enjoyed writing, but many unknowns and having “no time” for it prevented me from taking it up. Things like: “I don’t really know who my target audience is”, “what would my topic or topics be?”, “I don’t think I am a world-class expert in anything”, and many more kept stopping me from setting up my own blog. Now seemed like a good time as any so with those and tons of other... read more

machine learning

A brief review for deep Visual Odometry since 2016 July 20, 2022 22 minute read

Application of Deep Learning in Visual Odometry: A Brief Literature Review

Abstract

—Visual odometry is a technique for estimating camera egomotion based on continuous frame images and has important applications in areas such as UAV navigation and augmented reality. Traditional visual odometry mainly applies geometry-based methods, enabling near real-time applications on drones and robots. However, the classical methods have limited applications in challenging cases due to problems such as sensitivity to scene illumination and difficulty in detecting dynamic environments. With the booming development of deep learning in recent years, related techniques combined with visual odometry have emerged as... read more

A systematic overview for Visual Odometry of VSLAM June 30, 2022 5 minute read

Visual Odometry技术 (Of VSLAM)

[toc]

什么是SLAM

​ SLAM是Simultaneous localization and mapping缩写,意为“同步定位与建图”1。它是指搭载了特定传感器的主体,如机器人或者无人机等,在没有关于环境的先验知识的情况下,在运动的过程中建立环境的模型。SLAM的概念早在1986年2就提出了。然而,早期的SLAM往往依赖价格昂贵或专门定制的传感器,例如激光雷达,声呐或立体相机,这项技术并未走入市场。随着算法和算力的不断发展,廉价的相机逐渐成为一种取代激光雷达等复杂设备进行SLAM的可能 。那么,如果涉及到的传感器主要为相机,那么就称为“视觉(Visual)SLAM”, 也就是标题所叙述的、这篇博客主要探讨的VSLAM。

经典视觉SLAM框架

如图1所示,下面是经典视觉SLAM框架的主要组成结构:一个SLAM系统主要由Visual Odometry(视觉里程计,VO),Optimization(后端优化),Loop Closing(回环检测), Mapping(建图)这四个部分组成。

framework

图1 经典视觉SLAM框架

其中,VO能够通过相邻帧间的图像估计相机运动,并回复场景的空间结构。然而,仅仅有VO是不够的。VO其实就有点像马尔科夫链(Markov Chain)那样,只关注当前状态和未来状态的基本联系,不具有记忆性。这样一来,VO由于只有🐟的记忆,而每次的估计又会有一定偏差,每次估计的相机位姿运动偏差在机器人或者无人机运动过程中不断累加,形成累计偏移(Accumulating Drift)这些累计偏移有的时候会带来极为糟糕的后果。

如图2所示,设想一下,如果在估计的时候认为相机顺时针运动了90度,而实际上相机仅仅运动了89度,这样一来在一个空的矩形的房间里所作出的定位可能会因此不断远离相机的实际位置,而建立出来的地图也很可能会无法封闭。

OIP-C

图2 逐渐增大的偏移和无法闭合的地图

因此,我们需要在一个更宏观的视角下审视并且修正这些偏移。这样就引入了后端优化和回环检测这两个部分。

在后端优化中,则需要考虑相对更加长远的目标:在解决“如何从图像估计相机运动”的基础上从带有噪声的数据中估计整个系统的状态,以及这个状态估计的不确定性有多大,同时使得得到的相机位姿在全局上尽可能保持一致。相比于VO部分,Optimization往往没有那么可见,面对的只有数据,而不必关心数据的来自于激光雷达还是单目相机,又在视觉里程计之后,因此叫做后端。但是还是要注意的是,这里的前后端和web应用(如J2EE中的Servlet)中的前后端有所不同,要分清楚两者之间的区别。

在回环检测部分,最主要判断的还是机器人或者无人机是否达到之前到过的位置,如果检测到了闭环(往往是通过图像相似度判断实现),就会把信息提供给后端进行处理来得到一个全局一致的估计。用一个不太准确但是我自己觉得非常形象的比喻来说,这个过程就像是用把一个个用小棍子(预测的轨迹)穿起来的珠子(估计的点)头尾相接到一起保持中间各个珠子距离不变一样。现阶段应用最广的回环检测方法是词袋模型(Bag-of-Words),之后会详细介绍。

下面将对VSLAM框架中的VO技术进行具体介绍。

Visual Odometry

如上所述,Visual Odometry主要是计算图像帧之间 的相机位姿关系,也即通过拍摄图像,估计出相机的运行位置和姿态信息。根据所使用相机的类型,我们可以把VO分为单目VO和立体VO3,其中单目VO主要使用单目相机来获取环境的2D信息;而立体VO如RGB-D相机和双目相机在获取画面外能够直接通过结构光或者ToF获取场景深度信息或通过计算获得的场景深度信息(类似人眼)

在实际操作中,由于RGB-D相机由于很容易受到自然光的干扰,同时对于噪声的鲁棒性较差,本身价格也比较高不利于推广,因此主要用于室内SLAM;而双目相机的精度和深度方向上的量程受到基线长度,也即两个相机间距离的影响(但是做的宽一个是容易形变导致误差,一个是相机太宽影响运动),同时disparity map的计算要消耗大量的资源,往往需要GPU或者FPGA来加速,在深度上的测量很难达到令人满意的效果。

因此,单目相机SLAM技术便是这篇博客所要探讨的主要内容。在实践中,VO算法主要分为特征点法和直接法两类。

特征点法是通过汇总图像中所有有代表性的点的移动来预测相机的整体移动情况。由于通过矩阵在整个图像的层面来判断运动是十分困难的(LK光流需要强假设),因此我们可以用另一种图像的表现形式,也就是图像的特征来描述图片,减少不必要的信息(特征也可以看作是图像的主成分)。尽管特征点在面对墙体或者其他角点不显著(salient)的区域时可能难以识别4,但是在绝大多数场景下都能够找到充足的特征点来对帧间运动做出一个大致的估计。

传统的寻找特征点的方法主要包括Harris角点(参考BUAASE_CV_hw_set2)、FAST角点5等,这些经典的角点识别算法提出的时间较早,在图像变化幅度较大的情况下不够稳定。近年来不断发展的局部特征识别往往不仅匹配角点(或者也可以说兴趣点)本身,还会为角点提供相应的描述子(descriptor)来说明特征点的朝向和大小等信息。例如,SIFT就是一个十分经典的算法,能够对关照、尺度以及旋转都有很好的鲁棒性。然而,随着识别效果而来的还有巨大的计算量。与SfM不同,SLAM要求实时性,因此在课上熟知的SIFT很少被应用到SLAM的实际应用中。

那么有没有什么能够协调好准确率、鲁棒性以及计算量,使之能够适配SLAM的算法呢?当然有!这就是在SLAM中大名鼎鼎的ORB-SLAM,如图所示,就像YOLO一样,ORB也更新了很多版,证明了其强大的生命力。

image-20220630133322628

图3 orb各个版本的论文(图源本人)

ORB(Oriented FAST and Rotated BRIEF),是目前最快速稳定的特征点检测和提取算法,许多图像拼接和目标追踪技术利用ORB特征进行实现6。ORB-SLAM... read more

Who owns the copyright for an AI generated creative work? April 20, 2021 4 minute read

Recently I was reading an article about a cool project that intends to have a neural network create songs of the late club of the 27 (artists that have tragically died at age 27 or near, and in the height of their respective careers), artists such as Amy Winehouse, Jimmy Hendrix, Curt Cobain and Jim Morrison.

The project was created by Over the Bridge, an organization dedicated to increase awareness on mental health and substance abuse in the music industry, trying to denormalize... read more

neural networks

A brief review for deep Visual Odometry since 2016 July 20, 2022 22 minute read

Application of Deep Learning in Visual Odometry: A Brief Literature Review

Abstract

—Visual odometry is a technique for estimating camera egomotion based on continuous frame images and has important applications in areas such as UAV navigation and augmented reality. Traditional visual odometry mainly applies geometry-based methods, enabling near real-time applications on drones and robots. However, the classical methods have limited applications in challenging cases due to problems such as sensitivity to scene illumination and difficulty in detecting dynamic environments. With the booming development of deep learning in recent years, related techniques combined with visual odometry have emerged as... read more

A systematic overview for Visual Odometry of VSLAM June 30, 2022 5 minute read

Visual Odometry技术 (Of VSLAM)

[toc]

什么是SLAM

​ SLAM是Simultaneous localization and mapping缩写,意为“同步定位与建图”1。它是指搭载了特定传感器的主体,如机器人或者无人机等,在没有关于环境的先验知识的情况下,在运动的过程中建立环境的模型。SLAM的概念早在1986年2就提出了。然而,早期的SLAM往往依赖价格昂贵或专门定制的传感器,例如激光雷达,声呐或立体相机,这项技术并未走入市场。随着算法和算力的不断发展,廉价的相机逐渐成为一种取代激光雷达等复杂设备进行SLAM的可能 。那么,如果涉及到的传感器主要为相机,那么就称为“视觉(Visual)SLAM”, 也就是标题所叙述的、这篇博客主要探讨的VSLAM。

经典视觉SLAM框架

如图1所示,下面是经典视觉SLAM框架的主要组成结构:一个SLAM系统主要由Visual Odometry(视觉里程计,VO),Optimization(后端优化),Loop Closing(回环检测), Mapping(建图)这四个部分组成。

framework

图1 经典视觉SLAM框架

其中,VO能够通过相邻帧间的图像估计相机运动,并回复场景的空间结构。然而,仅仅有VO是不够的。VO其实就有点像马尔科夫链(Markov Chain)那样,只关注当前状态和未来状态的基本联系,不具有记忆性。这样一来,VO由于只有🐟的记忆,而每次的估计又会有一定偏差,每次估计的相机位姿运动偏差在机器人或者无人机运动过程中不断累加,形成累计偏移(Accumulating Drift)这些累计偏移有的时候会带来极为糟糕的后果。

如图2所示,设想一下,如果在估计的时候认为相机顺时针运动了90度,而实际上相机仅仅运动了89度,这样一来在一个空的矩形的房间里所作出的定位可能会因此不断远离相机的实际位置,而建立出来的地图也很可能会无法封闭。

OIP-C

图2 逐渐增大的偏移和无法闭合的地图

因此,我们需要在一个更宏观的视角下审视并且修正这些偏移。这样就引入了后端优化和回环检测这两个部分。

在后端优化中,则需要考虑相对更加长远的目标:在解决“如何从图像估计相机运动”的基础上从带有噪声的数据中估计整个系统的状态,以及这个状态估计的不确定性有多大,同时使得得到的相机位姿在全局上尽可能保持一致。相比于VO部分,Optimization往往没有那么可见,面对的只有数据,而不必关心数据的来自于激光雷达还是单目相机,又在视觉里程计之后,因此叫做后端。但是还是要注意的是,这里的前后端和web应用(如J2EE中的Servlet)中的前后端有所不同,要分清楚两者之间的区别。

在回环检测部分,最主要判断的还是机器人或者无人机是否达到之前到过的位置,如果检测到了闭环(往往是通过图像相似度判断实现),就会把信息提供给后端进行处理来得到一个全局一致的估计。用一个不太准确但是我自己觉得非常形象的比喻来说,这个过程就像是用把一个个用小棍子(预测的轨迹)穿起来的珠子(估计的点)头尾相接到一起保持中间各个珠子距离不变一样。现阶段应用最广的回环检测方法是词袋模型(Bag-of-Words),之后会详细介绍。

下面将对VSLAM框架中的VO技术进行具体介绍。

Visual Odometry

如上所述,Visual Odometry主要是计算图像帧之间 的相机位姿关系,也即通过拍摄图像,估计出相机的运行位置和姿态信息。根据所使用相机的类型,我们可以把VO分为单目VO和立体VO3,其中单目VO主要使用单目相机来获取环境的2D信息;而立体VO如RGB-D相机和双目相机在获取画面外能够直接通过结构光或者ToF获取场景深度信息或通过计算获得的场景深度信息(类似人眼)

在实际操作中,由于RGB-D相机由于很容易受到自然光的干扰,同时对于噪声的鲁棒性较差,本身价格也比较高不利于推广,因此主要用于室内SLAM;而双目相机的精度和深度方向上的量程受到基线长度,也即两个相机间距离的影响(但是做的宽一个是容易形变导致误差,一个是相机太宽影响运动),同时disparity map的计算要消耗大量的资源,往往需要GPU或者FPGA来加速,在深度上的测量很难达到令人满意的效果。

因此,单目相机SLAM技术便是这篇博客所要探讨的主要内容。在实践中,VO算法主要分为特征点法和直接法两类。

特征点法是通过汇总图像中所有有代表性的点的移动来预测相机的整体移动情况。由于通过矩阵在整个图像的层面来判断运动是十分困难的(LK光流需要强假设),因此我们可以用另一种图像的表现形式,也就是图像的特征来描述图片,减少不必要的信息(特征也可以看作是图像的主成分)。尽管特征点在面对墙体或者其他角点不显著(salient)的区域时可能难以识别4,但是在绝大多数场景下都能够找到充足的特征点来对帧间运动做出一个大致的估计。

传统的寻找特征点的方法主要包括Harris角点(参考BUAASE_CV_hw_set2)、FAST角点5等,这些经典的角点识别算法提出的时间较早,在图像变化幅度较大的情况下不够稳定。近年来不断发展的局部特征识别往往不仅匹配角点(或者也可以说兴趣点)本身,还会为角点提供相应的描述子(descriptor)来说明特征点的朝向和大小等信息。例如,SIFT就是一个十分经典的算法,能够对关照、尺度以及旋转都有很好的鲁棒性。然而,随着识别效果而来的还有巨大的计算量。与SfM不同,SLAM要求实时性,因此在课上熟知的SIFT很少被应用到SLAM的实际应用中。

那么有没有什么能够协调好准确率、鲁棒性以及计算量,使之能够适配SLAM的算法呢?当然有!这就是在SLAM中大名鼎鼎的ORB-SLAM,如图所示,就像YOLO一样,ORB也更新了很多版,证明了其强大的生命力。

image-20220630133322628

图3 orb各个版本的论文(图源本人)

ORB(Oriented FAST and Rotated BRIEF),是目前最快速稳定的特征点检测和提取算法,许多图像拼接和目标追踪技术利用ORB特征进行实现6。ORB-SLAM... read more

Who owns the copyright for an AI generated creative work? April 20, 2021 4 minute read

Recently I was reading an article about a cool project that intends to have a neural network create songs of the late club of the 27 (artists that have tragically died at age 27 or near, and in the height of their respective careers), artists such as Amy Winehouse, Jimmy Hendrix, Curt Cobain and Jim Morrison.

The project was created by Over the Bridge, an organization dedicated to increase awareness on mental health and substance abuse in the music industry, trying to denormalize... read more

optimization

Neural Network Optimization Methods and Algorithms March 12, 2021 8 minute read

For the seemingly small project I undertook of creating a machine learning neural network that could learn by itself to play tic-tac-toe, I bumped into the necesity of implementing at least one momentum algorithm for the optimization of the network during backpropagation.

And since my original post for the TicTacToe project is quite large already, I decided to post separately these optimization methods and how did I implement them in my code.

Adam

source

Adaptive Moment Estimation (Adam) is an optimization method that computes adaptive learning rates for each weight and bias. In addition to storing an... read more

python

Deep Q Learning for Tic Tac Toe March 18, 2021 12 minute read

Background

After many years of a corporate career (17) diverging from computer science, I have now decided to learn Machine Learning and in the process return to coding (something I have always loved!).

To fully grasp the essence of ML I decided to start by coding a ML library myself, so I can fully understand the inner workings, linear algebra and calculus involved in Stochastic Gradient Descent. And on top learn Python (I used to code in C++ 20 years ago).

I built a general purpose basic ML library that... read more

Machine Learning Library in Python from scratch February 28, 2021 4 minute read

It must sound crazy that in this day and age, when we have such a myriad of amazing machine learning libraries and toolkits all open sourced, all quite well documented and easy to use, I decided to create my own ML library from scratch.

Let me try to explain; I am in the process of immersing myself into the world of Machine Learning, and to do so, I want to deeply understand the basic concepts and its foundations, and I think that there is no better way to do so than by creating myself all the... read more

Conway's Game of Life February 10, 2021 3 minute read

I am lately trying to take on coding again. It had always been a part of my life since my early years when I learned to program a Tandy Color Computer at the age of 8, the good old days.

Tandy Color Computer TRS80 IIITandy Color Computer TRS80 III

Having already programed in Java, C# and of course BASIC, I thought it would be a great idea to learn Python since I have great interest in data science and machine learning, and those two topics seem to have an avid community within Python coders.

For one of my starter quick programming... read more

reinforcement learning

Deep Q Learning for Tic Tac Toe March 18, 2021 12 minute read

Background

After many years of a corporate career (17) diverging from computer science, I have now decided to learn Machine Learning and in the process return to coding (something I have always loved!).

To fully grasp the essence of ML I decided to start by coding a ML library myself, so I can fully understand the inner workings, linear algebra and calculus involved in Stochastic Gradient Descent. And on top learn Python (I used to code in C++ 20 years ago).

I built a general purpose basic ML library that... read more

thoughts

Starting the adventure March 24, 2021 10 minute read

In the midst of a global pandemic caused by the SARS-COV2 coronavirus; I decided to start blogging. I wanted to blog since a long time, I have always enjoyed writing, but many unknowns and having “no time” for it prevented me from taking it up. Things like: “I don’t really know who my target audience is”, “what would my topic or topics be?”, “I don’t think I am a world-class expert in anything”, and many more kept stopping me from setting up my own blog. Now seemed like a good time as any so with those and tons of other... read more

unreal engine

On the trend towards the digital twin February 8, 2023 7 minute read

From Nature Comments by Tao

Digital twins — precise, virtual copies of machines or systems — are revolutionizing industry. Driven by data collected from sensors in real time, these sophisticated computer models mirror almost every facet of a product, process or service. Many major companies already use digital twins to spot problems and increase efficiency1. Half of all corporations might be using them by 2021, one analyst predicts.

For instance, NASA uses digital copies to monitor the status of its spacecraft. Energy companies General Electric (GE) and Chevron use them to track the operations of wind turbines. Singapore... read more

  • artificial intelligence (3)
  • cesium ion (1)
  • coding (10)
  • copyright (1)
  • creativity (1)
  • database (2)
  • deep Neural networks (1)
  • digital twin (1)
  • general blogging (1)
  • life (1)
  • machine learning (8)
  • neural networks (6)
  • optimization (1)
  • python (3)
  • reinforcement learning (1)
  • thoughts (1)
  • unreal engine (1)

    2024 © Huang Yijun

    Total visits for YjHuang:
    Posts
    Tags
    About