闵大荒之旅（五） ----- OpenCV与CUDA编程的结合 -

lps_683

浏览: 346562 次
性别:
来自: 上海

最近访客更多访客>>

wh_xia_jun

jixiaoyun

fireblue

1344760557

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

闵大荒之旅（五） ----- OpenCV与CUDA编程的结合

博客分类：

毕业设计

参考http://www.cnblogs.com/dwdxdy/p/3528711.html博客，加之以实践环节，我们可以知道有几种使用到GPU运算的方法：

1.利用OpenCV中提供的GPU模块

2.单独使用Cuda API编程

3.利用OpenCV中提供接口，并结合Cuda API编程

如果仅仅使用OpenCV中的GPU函数，就像博客（三）中演示一下，的确非常的简单而且可以得到比较理想的效果，但是缺点也是显然的，这种直接利用别人的函数是非常不灵活的。很多情况下，并行计算都会存在一个最优的问题，如何使得实际问题计算最优化，这就需要我们自己去思考，而不是仅仅调用别人的函数；

如果光单独使用CUDA编程，那这个工作量是可想而知的了，上文中有说道，GPU是处理计算量大计算繁琐复杂的部分，而其他的部分交给CPU去完成就是，那么这一部分完全可以使用直接提供的函数，例如OpenCV，所以，光使用CUDA进行编程，我也觉得没有必要；

最后，针对我所研究的课题，OpenCV结合CUDA编程是再适合不过的了。

这一次，我主要介绍在Ubuntu下如何将OpenCV与CUDA结合起来，也就是如何将.CU文件和.CPP文件结合起来得到想要的结果。这一部分我在网上查阅过大量的资料，但是资料非常的少，有的也只是很含糊的一笔带过，虽然方法非常简单，但是这里我还是详细地说明一下。

首先，我们的目的是：OpenCV+CUDA，完成对图像的灰度化

方法非常的简单，就是利用平均值算法 gray = ( r + g + b ) / 3得到对应像素点的值，那么这个计算部分我们就交给CUDA完成,最后结果我们将返还给CPU进行进一步处理以及结果显示。

先上代码：

test.cu文件

#include <opencv2/core/cuda_devptrs.hpp>
#include "device_launch_parameters.h"
#include "cuda.h"
#include "cuda_runtime_api.h"
#include "book.h"
using namespace cv;
using namespace cv::gpu;
//自定义内核函数
__global__ void swap_rb_kernel(const PtrStepSz<uchar3> src, PtrStep<unsigned char> dst)
{
  int y = blockIdx.y*blockDim.y+threadIdx.y;
  int x = blockIdx.x*blockDim.x+threadIdx.x;  
  if(x < src.cols && y < src.rows) 
  {
       dst(y, x) = (src(y, x).x + src(y, x).y + src(y, x).z) / 3;
  }
}

extern "C" void swap_rb_caller(const PtrStepSz<uchar3>& src, PtrStep<unsigned char> dst, cudaStream_t stream)
{
  int uint = 8;

  dim3 grid(src.cols + uint-1/uint,src.rows + uint-1/uint);
  dim3 block(uint,uint);
  swap_rb_kernel<<<grid,block>>>(src, dst);
  if(stream == 0)
        cudaDeviceSynchronize();
}

testcpp.cpp

#include "stdio.h"
#include "opencv2/core/core.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/gpu/gpu.hpp"
#include "opencv2/ml/ml.hpp"
#include "opencv2/gpu/gpu.hpp"
#include <opencv2/gpu/stream_accessor.hpp>
#include "cuda.h"
#include "cuda_runtime_api.h"
#include <cmath>
#include <iostream>
#ifdef _DEBUG
#pragma comment(lib, "opencv_core249d.lib")
#pragma comment(lib, "opencv_imgproc249d.lib")
#pragma comment(lib, "opencv_highgui249d.lib")
#else
#pragma comment(lib, "opencv_core249.lib")
#pragma comment(lib, "opencv_imgproc249.lib")
#pragma comment(lib, "opencv_highgui249.lib")
#endif // DEBUG
using namespace cv;
using namespace cv::gpu;
extern "C" void swap_rb_caller(const PtrStepSz<uchar3>& src, PtrStep<uchar3> dst, cudaStream_t stream);
void swap_rb(const GpuMat& src, GpuMat& dst, Stream& stream = Stream::Null())
{
    CV_Assert(src.type() == CV_8UC3);
    dst.create(src.size(), CV_8UC1);
    cudaStream_t s = StreamAccessor::getStream(stream);
    swap_rb_caller(src, dst, s);
    //cuResize(src, src.getWidth(), src.getHeight(), dst, dst.getWidth(), dst.getHeight())
}

int main()
{
    cudaSetDevice(0);
    cudaFree(0);    
    printf("hello\n");
    Mat image = imread("test.png");
    imshow("src",image);
    GpuMat gpuMat,output;
    output.create(image.size(), CV_8UC1);
    Mat result;
    result.create(image.size(), CV_8UC1);
    double start = (double)getTickCount();
    gpuMat.upload(image);
    swap_rb(gpuMat,output);
    output.download(result);
    double t = ((double)getTickCount() - start)/getTickFrequency();
    printf("gpu time is : %d ms\n", (int)(1000 * t));
    Mat result2;
    start = (double)getTickCount();
    cvtColor(image, result2, COLOR_BGR2GRAY);
    t = ((double)getTickCount() - start)/getTickFrequency();
    printf("cpu time is : %d ms\n", (int)(1000 * t));
    imshow("gpu", result);
    imshow("cpu", result2);
    waitKey(0);
    return 0;
}

这么一看，这个问题实际上就变成了CPP中调用CU中定义的函数

这里我相信大家已经看到了，从CPU传送数据到GPU实际上就是GpuMat->PtrStepSz的过程，注意，如果Mat是3通道的，就要PtrStepSz<uchar3>，一通道的则为PtrStepSz<unsigned char>。

关于PtrStepSz的具体信息，可以参考http://docs.opencv.org/2.4/modules/gpu/doc/data_structures.html

有了程序，接下来我们尝试运行它。

首先，编译.CU文件，在终端下输入命令：

nvcc test.cu -c -o test

编译成功后生成了一个test的文件，类型暂不明

终端输入命令：

ar cqs libtest5.a test

则生成了一个libtest5.a的静态库，这个库名可以自己另取

接下来我们要将库考入ubuntu默认搜素路径：

sudo cp libtest5.a /usr/lib/

于是，接下来可以编译cpp程序：

g++ testcpp.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_calib3d -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_video -lopencv_videostab -L./ -ltest5 -o testcpp -lcudart -lpthread

这里一定要注意了，如果报出DSO missing from command line这个错误，第一看看相应的包是否导入/usr/iib/中，第二，如果已经导入成功了，就将-l语句写到命令的最后，编译才不会出错！！！！一定注意！

于是，就可以生成我们熟悉的可执行文件

运行得到结果：