关于 “VM resize revert 失败” 问题的分析
作者:张航东
版本: Kilo 2015.1.1
1. Problem
When we tested Kilo 2015.1.1, we met an error (randomly) about resize-revert function. The error finally caused VM goto “Error” status, because of the “VirtualInterfaceCreateException”.
We can reproduce the error easily through the following step:
Step 1. Lanuch 3-5 VMs:
Step 2. Resize these VMs one by one, but not confirm.
Step 3. Revert them one by one. And repeat Step 2 to Step 3. Then we can see some VM will stay at “reverting” status , and go to “Error” finally.
And, we can see the following “nova-compute.log” with “VirtualInterfaceCreateException”.
File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 298, in decorated_function
return function(self, context, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 377, in decorated_function
return function(self, context, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 355, in decorated_function
kwargs['instance'], e, sys.exc_info())
File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
six.reraise(self.type_, self.value, self.tb)
File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 343, in decorated_function
return function(self, context, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3868, in finish_revert_resize
block_device_info, power_on)
File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6788, in finish_revert_migration
raise ex
VirtualInterfaceCreateException: Virtual Interface creation failed
2. Analysis
2.1 Reason for short
For short, the error caused by nova wait for an event (network-vif-plugged-xxxx<port id>) from neutron, but neutron not send because the port (vif) inconsistent with its binding host. So make nova timeout for waiting the event
And, In normal situation, nova will not wait the event.
2.2 Resize revert success (Sequence)
Above is the sequence diagram of successful resize revert. There are some important steps:
Step 1.1.1, vif.active was set to true
Step 1.3.1.1, no any event be set for waiting, because the vif.active was not false.
Step 1.3.2.1, after call libvirt to create VM, no event need waited, the process keep going.
Step 1.4.1.1, neutron change the host of port binding in DB (neutron.ml2_port_bindings).
For example, We create a VM in host_A, the VM’port will bind with host_A. If we resize VM from host_A to host_B, and not confirm/revert, now, the VM’port will bind with host_B; then, we revert VM, the VM’s port will binding with host_A again, and it is changed in this step.
2.3 Resize revert failed (Sequence)
Above is the sequence diagram of failed resize revert, we can see there are some differences:
Step 1.1.1, vif.active was set to false
Step 1.3.1.1, event named network-vif-plugged-xxxx was set for waiting.
Step 1.3.2.1, after call libvirt to create VM, nova will hang up and waiting for the event (network-vif-plugged-xxxx)
Step 2.1.1.1.1, neutron get the host of port bound in DB (neutron.ml2_port_bindings), and compare it with the host which VM will revert to. Because neutron found they are inconsistent (DB is wrong), so it return at once and not send the event which nova waiting for.
Step 1.4.1.1, As mentioned in “Resize revert success” chapter, the host of port binding in DB will be changed here. But this operation is called by “1.4 migrate_instance_finish()”, and it (migrate_instance_finish) can not be runned, because nova was hanged up and waiting for event.
So, the error raised.
Follows are the codes about neutron not send the event:
PS: “port_host” is from DB; “host” is input parameter, and from the target host (the host VM revert to). They are inconsistent, and we can see the info in DB is wrong.
2.4 Why there is difference between success and failure
According to above analysis, we can see there is a main difference beween success and failure: vif.active = true/false (ture in success, and false in failure).
Following is the sequence diagram of vif status change when resize-revert.
Note:The source/target hosts mentioned above are relative to revert operation. For example, We create a VM in host_A, and resize it to host_B (not confirm). Then while we revert the VM, host_B is source host, and host_A is target host.
There are 3 processes in above sequence diagram:
1. Revert_resize() function in source host (host_B).
2. Finish_revert_resize() function in target host (host_A).
3. Linuxbridge neutron agent daemon on source host (host_B). The daemon has 2 seconds interval (default) and can be set in “/etc/neutron/plugins/linuxbridge/linuxbridge_conf.ini” on compute host:
Some important steps:
Step 1.1, on source host, the tap is removed by libvirt.
Step 1.2, on target host, finish_revert_resize() function run.
Step 1.2.1.1, on target host, _build_network_info_model() function get vif status by client.list_ports() function, then set vif.active = true/false.
Step 2.1, at the same time, on source host, linuxbridge neutron agent daemon found device (port) info changed (be removed), and start process_network_devices() and treat_devices_removed().
Step 2.1.1.1, on source host, linuxbridge neutron agent daemon set vif status DOWN.
In normal time, step 1.2.1.1 usually run before step 2.1.1.1, because the latter one is triggered by the daemon with 2s interval. So resize-revert will success.
But, occasionally, when step 2.1.1.1 run before step 1.2.1.1, the error will raised.
And, there is still an unreasonable thing: in success situation, on target host, _build_network_info_model() function get vif status as “Active”, but at the time, the “Active” is the vif status on source host.
3. Solution
3.1 Solution 1 – Set “vif_plugging_is_fatal = false” in nova.conf
At first glance, maybe this is not a good way to fix the error.
But I guess, in NFV scenario, customer may not create new VM frequently, What they most care about is how to maintein all exiting VMs. If this, resize/migrate/evacuate will be more important, so when we set “vif_plugging_is_fatal = false”, we can always get an active VM even if a wrong vif, I think this is better than an error VM.
3.2 Solution 2 – Modify code
We can see in above 2 sequence diagrams, nova do nothing in “setup_networks_on_host()” function (step 1.2 and step 1.2.1 in sequence diagram).
We will change here, actually setup network to change the host of port binding. So that, in later process, neutron will get a correct info (host of port binding) from DB.
相关推荐
4. **检查已有的监听器**:有时候,问题可能是由于已经存在的`resize`事件监听器没有正确清除,导致新的监听器无法正常工作。确保在添加新监听器前移除旧的。 ```javascript window.removeEventListener('resize'...
通过示例 RESIZE 示例代码显示如何修改 Windows 窗口以便当用户使用鼠标来调整窗口边框跳转到下一个可用大小自动调整方式。 更多信息 可用于从 Microsoft 下载中心下载下列文件: <br>Resize.exe ...
resizeWindow.txt resizeWindow.txt resizeWindow.txtresizeWindow.txtresizeWindow.txt resizeWindow.txt resizeWindow.txt resizeWindow.txt
ReSize模块是易语言生态中的一个重要组成部分,它专注于处理窗口大小调整、控件尺寸控制等与界面布局相关的问题。本篇文章将对易语言模块ReSize进行详尽的解析,并探讨其在实际应用中的各种技巧。 一、ReSize模块的...
- 兼容性检查:虽然该插件能很好地处理元素的`resize`事件,但在极少数的旧版浏览器中可能存在问题,使用前最好进行兼容性测试。 总结,`jquery.ba-resize.min.js`插件是解决jQuery元素`resize()`事件的一个有效...
HashMap之resize()方法源码解读 HashMap的resize()方法是HashMap中最核心的方法之一,该方法负责扩容HashMap的容量,以便存储更多的键值对。下面我们将对HashMap的resize()方法进行源码解读,了解其扩容机制和原理...
Go-resize库通常会返回错误,以便于在出现问题时进行调试。 7. **社区支持与文档** 作为开源项目,Go-resize在GitHub上有完整的源代码和示例,还有社区提供的帮助和支持。开发者可以通过阅读源码、查看示例或者...
在嵌入式系统和图形处理领域,"DMA2D_bilinear_resize_resize_bilinear_dma_" 这个标题暗示了我们正在讨论一个与直接存储器访问(DMA)2D引擎相关的技术,特别是涉及到双线性插值缩放(bilinear resizing)的实现。...
本教程将详细解释如何使用这两个库批量处理图像,实现图像的resize操作。 首先,让我们了解OpenCV。OpenCV(开源计算机视觉库)是一个跨平台的计算机视觉库,包含了大量的图像处理和计算机视觉算法。在Python中,...
matlab实现resize函数,调用方法为MyResize(I,scale,method),其中I为图像读入矩阵,scale为放大或缩小的系数,method支持nearest和bilinear两种方式
根据给定的文件信息,我们可以总结出一个关于PHP图片处理的知识点:如何使用自定义的`resizeimage`类来缩放图片。以下是对这个知识点的详细解析: ### PHP中的图片处理与`resizeimage`类 #### 1. 类定义与初始化 ...
ImageResize插件在处理图片时,会考虑到安全因素,例如防止图片被恶意篡改,同时也关注性能问题,确保在高并发环境下仍能快速响应。通过限制最大尺寸和合理使用缓存,避免了服务器资源的过度消耗。 总结,...
PB 9 Resize 实例源代码解析 PowerBuilder(简称PB)是一种流行的开发工具,主要用于构建企业级的应用程序,尤其在数据库应用开发领域有着广泛的应用。版本9是它的一个历史版本,提供了许多增强功能,其中包括对象...
resizeHandle.style.cursor = 'se-resize'; draggableResizabled.appendChild(resizeHandle); resizeHandle.addEventListener('mousedown', function(event) { isDragging = false; isResizing = true; ...
标题中的"ReSize演示access2000"指的是一个关于Access 2000数据库管理系统中数据表或对象尺寸调整的示例或者程序。在Access中,ReSize通常与控件、表格或窗体的大小调整有关,可能是为了优化用户界面或者适应不同...
方法一:在标签上加入 onLoad=”” onResize=”” 方法 写上对应的方法即可方法二:[removed]=function(){///…..} 在方法里面写上对应的代码即可着两种方法基本都可以解决你的问题了 代码如下: [removed]=...
在IT行业中,图像处理是一项常见的任务,特别是在网页设计、摄影后期和数据分析等领域。"Resize",即调整图像大小,是图像处理中的基本操作之一。批量处理则是在处理大量图像时提高效率的重要手段,尤其适用于拥有...
Image Resize Guide可以自由调整图片大小,当前能够修改图片大小软件很多很多,基本上图片处理软件都能修改图片大小。 你只需点击几次鼠标,软件会自动帮你完成余下的工作。对雪景,水或绿树背景的图片效果出众,重要...
本问题中提到的解决方案是通过结合Vue和原生JavaScript来实现ECharts图表的自适应resize,以避免使用第三方插件如`element-resize-detector`带来的性能问题。 首先,我们需要理解ECharts图表的初始化和更新过程。在...