Nvidia CUDA 3.0 更新

love19820823

浏览: 974510 次

最近访客更多访客>>

shengwei729

morelily

gxlit

fangpeng123456789

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (1393)

社区版块

存档分类

2011-12 ( 97)
2011-11 ( 33)
2011-10 ( 26)
更多存档...

- Section 1.2
- Updated figure

添加了说明图，更好的解释了CUDA不只是一个语言，而是一个平台，一个platform，可以在CUDA之上可以架构其他语言平台，或则编程环境。CUDA有自己的ISA架构，有PTX代码，所以不要简单的把CUDA理解为是编程语言，可以根据CUDA的架构开发自己的芯片，或者硬件，当然，这个得要有详细的CUDA资料才行·~至少现在还不能。。。

- Section 2.5
- Mentioned the Fermi architecture

说明了Fermi是2.x的架构，在他之前的都是1.x的架构。Fermi算是一个进步吧。

- Section 3.1
- Heavily rewritten to clarify binary, ptx, application, C++ compatibility
- __noinline__ behaves differently for compute capability 2.0 and higher

介绍了NVCC和binary，ptx和应用程序，还有C++的关系；CUDA的kernel程序可以用CUDA的指令来写，这个类似汇编的指令就是PTX，PTX可以从它的手册里面找到更详细的介绍；

3.1.1 部分详细介绍了nvcc的编译过程，怎么把CU文件或者CUDA的程序编译成目标文件，怎么把C/C++语言的部分提交给C或者C++的编译器编译。

3.1.2 说明了二进制文件的情况，说明了code代表的意思，说明例如1.3的标示说明这个二进制的文件是在1.3的硬件或者之后的硬件上才能运行。

3.1.3 简单说明了一下PTX的指令一般都可以执行，但是有些指令只能在更高的硬件设备上才能执行；

3.1.4 说明了不同的版本的二进制文件和ptx代码，在将来的硬件上执行的情况，当然手册推荐采用PTX代码格式，以后就可以在运行的时候自动转义过去，这样就可以适应更新的特性，因为其实现在的一些硬件在编译一条ptx指令的时候，可能真正的在硬件方面其实使用了更多的指令，因为还不支持原生态的ptx指令，当以后的ptx指令可以一条执行的时候，就会发生变化，所以这个地方提出了说明；

3.1.5

说明了一些支持的C++的特性，不是所有的C++都能支持，可以在后面的附录中查到；

- Section 3.2
- Clarified that a CUDA context is created under the hood when initializing
the runtime and therefore CUDA resources are only valid in the context of
the host thread that initialized the runtime
- Updated graphics interoperability sections to new API

说明了现在的CUDA运行的每一个资源都在他的同一个context里面，这个后面也会说道，一个thread 控制一个GPU运行；

- Section 3.2.1
- Mentioned 40-bit address space for devices of compute capability 2.0

2.0的硬件设备有了40bit的寻址能力；

- Section 3.2.5.3
- Mentioned atomics to mapped page-locked memory

说明了page-locked的内存在原子操作跟从host或则其他设备来讲，并不是安全的原子操作；

- Section 3.2.6
- Added concurrent kernel execution and concurrent data transfer for devices
of compute capability 2.0

以前只能一次一次的执行kernel函数，现在可以一次执行多个kernel函数；

- Section 3.3
- Updated graphics interoperability sections to new API

后面部分就是一些新的函数
- New Section 3.4 about interoperability between runtime and driver APIs
- Chapter 4 and 5 mostly rewritten with additional information
- Part of appendix A moved to new appendices G with additional information
- Section B.1.4
- Mentioned that kernel parameters are passed via constant memory for
devices of compute capability 2.0
- Section B.6
- Added new functions __syncthreads_count(), __syncthreads_and(), and
__syncthreads_or()
- Section B.10
- Mentioned atomics to mapped page-locked memory
- Section B.11
- Added new functions __ballot()
- New Section B.12 on profiler counter function
- New Section B.14 on launch bounds
- Section C.1.1
- Updated error for some functions
- Updated based FMAD being fused for compute capability 2.0
- Section C.1.2
- atomicAdd works with single-precision floating-point numbers for devices
of compute capability 2.0
- Updated error for some functions
- Section C.2.1
- Added new functions
- Section C.2.2
- Added new functions
- New Section D.6 about classes with non virtual member functions for devices
of compute capability 2.0
- New appendix E for nvcc specifics (moved __noinline__, #pragma unroll to this
appendix and added __restrict)

注解：

3.0的更新期待一些新特性，但是总体变化不大，倒是3.0的guide比较不错，可以好好的坎坷chapter3，里面有很多很详细的讲解，有时间可以多看看那一部分。

PS：看了VS2010的广告，不禁感叹，谁又会是我的下一行code啦……

分享到：

有所思而有所作（古诗体） | Java版棋类游戏合集——纵横之道

2010-03-23 12:51
浏览 703
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论