- 浏览: 622398 次
- 性别:
- 来自: 杭州
文章分类
最新评论
-
oldrat:
引用Special cases aren't special ...
武汉大学开源技术俱乐部 技术交流 第1期 -
yzsunlight:
试了试 ,不行
Android Studio SDK Manager无法正常下载如何设置 -
qianjigui:
更全面的文档:http://www.5wpc.info/it/ ...
Ruby正则表达式操作参考 -
qianjigui:
Anddy 写道Anddy 写道tag是自动创建的吗? 能手动 ...
vim的跳转 -
Anddy:
Anddy 写道tag是自动创建的吗? 能手动创建吗? 在sh ...
vim的跳转
Cache Lab: Improving Program Locality
INTRODUCTION
This exercise deals with optimizing memory-intensive code. Image processing is one area that benefits greatly from such optimizations.
In this exercise we'll be optimizing two functions: rotate , a function designed to rotate an image 90 degrees clockwise, and smooth , a function designed to smooth (or more precisely to blur) an image. Your goal is to maximize the cache hit rate of these functions on a simulated L1 cache. We provide you with a cache simulator that simulates the performance of a computer's cache.
For our purposes, we will consider an image to be represented by a two-dimensional matrix M, where Mi,j denotes the (i ,j )th pixel of M. To simplify things, this assignment will deal only with square images. Rows and columns are numbered in zero-indexed fashion (like arrays), so rows and columns number from 0 to N-1, where N is the width/height of the image matrix. Pixel values consist of four 1-byte fields representing red, green, blue, and alpha.
LOGISTICS
The files needed for this assignment can be downloaded here . Once you've extracted the zip file to its own directory, you'll see a number of C source and header files:
File: | Function: | |
cache.c | Contains the code used for the cache simulation | |
cache.h | Header file for cache simulator | |
defs.h | Contains commonly used definitions and structures | |
cache.vcproj | Visual C++ project file | |
driver.c | The main program for testing various functions | |
rotate.c | Contains the rotate functions. You will modify this file. | |
smooth.c | Contains the smooth functions. You will modify this file. |
The only files you need to change, and the only files you will
submit, are rotate.c
and smooth.c
. You're
free to change the driver program as you see fit, but such
changes won't be submissible.
IMPLEMENTATION DETAILS
Data Representation:
The fundamental data structure of our images is the pixel structure, shown below:
typedef struct { unsigned short red : 8; unsigned short green : 8; unsigned short blue : 8; unsigned short alpha : 8; } pixel;
The structure definition above defines a 32-bit pixel, with 8 bits for each of the red, green, blue and alpha (opacity) components.
A two-dimensional square image of width n is stored in a one-dimensional array of pixel s; the (i, j)th pixel of the image is at Img[PIXEL(i,j,n)] , and PIXEL is defined as follows:
#define PIXEL(i,j,n) ((i)*(n)+(j))
In order to use the cache simulator, we call it indirectly through use of the COPY and SMOOTH macros defined in defs.h . You must use these macros for doing your COPY and SMOOTH operations.
These are all defined in defs.h .
Cache Structure:
The cache we will be simulating is a 16 KB direct-mapped cache, with 32 byte cache lines. You may wish to refer back to the notes to determine how best to optimize for such a configuration.
Rotate:
The following C function takes a source image, src , of size dim x dim and puts a rotated copy into the destination image dst .
void rotate_naive(int dim, pixel* src, pixel* dst) { int i, j; for(i=0; i<dim; i++) { for(j=0; j<dim; j++) { COPY(&dst[PIXEL(dim-1-j,i,dim)],&src[PIXEL(i,j,dim)]); } } return; }
This code traverses the rows of the source image, copying each into a column of the destination image. Your task is to try to maximize the number of cache hits by adjusting the algorithm to take advantage of the cache.
Smooth:
The following C function takes a source image of size dim x dim that is specified by src , and puts a 'smoothed' copy into the destination image dst . The actual smoothing is done by the SMOOTH macro, which takes in first the address of the destination pixel, and then the addresses of the source pixel and the 8 pixels surrounding it. For cases on the border of the image, COPY the pixel straight from the source to the destination, so as to not have to deal with the special case of not having 8 surrounding pixels.
void smooth_naive(int dim, pixel *src, pixel *dst) { int i, j; for(i=0; i<dim;i++) { COPY(&dst[PIXEL(i,0,dim)], &src[PIXEL(i,0,dim)]); COPY(&dst[PIXEL(i,dim-1,dim)], &src[PIXEL(i,dim-1,dim)]); } for(j=1; j<dim-1;j++) { COPY(&dst[PIXEL(0,j,dim)], &src[PIXEL(0,j,dim)]); COPY(&dst[PIXEL(dim-1,j,dim)], &src[PIXEL(dim-1,j,dim)]); } for(i=1; i<dim-1; i++) { for(j=1; j<dim-1; j++) { SMOOTH(&dst[PIXEL(j,i,dim)], &src[PIXEL(j,i,dim)], &src[PIXEL(j-1,i,dim)], &src[PIXEL(j+1,i,dim)], &src[PIXEL(j,i+1,dim)], &src[PIXEL(j,i-1,dim)], &src[PIXEL(j-1,i-1,dim)], &src[PIXEL(j+1,i+1,dim)], &src[PIXEL(j-1,i+1,dim)], &src[PIXEL(j+1,i-1,dim)]); } } return; }
The code first takes care of the edge cases and does a straight copy for the border. It then traverses the image in standard fashion, smoothing each pixel as it comes. As with rotate, your goal is to maximize cache hitrate by improving locality.
Evaluation:
The improved algorithms you submit will be graded based on the cache simulator included in the zip file you downloaded earlier. Functions will be run on images of a number of different sizes (listed below), and for each size will be given a hitrate equal to the total number of cache hits divided by the number of cache attempts in the image. (Higher numbers are better.) A function's 'hit score' will be determined by taking the geometric mean (explained below) of the ratios produced by dividing your function's hitrate by the naive implementation's hitrate.
For both rotate and smooth, a geometric mean of 5 numbers is computed by taking the 5th root of the product of those numbers, so for the five dimensions listed below the formula would be:
Assumptions:
To make optimization easier, you may assume that the image dimensions will always be a multiple of 32. Your code must be able to correctly rotate for all dimensions that are multiples of 32, but your performance scores will be determined based solely upon the values listed below:
64 | 128 | 256 | 512 | 1024 |
SETUP
Versioning
The rotate.c and smooth.c that you unzip contain only two functions each: the original naive implementation of their respective function, and a "register" function. This function provides an easy way to compare multiple functions at the same time, and is called by the driver program before testing.
void register_rotate_functions() { add_rotate_function(&rotate, rotate_descr); }
The function contains one or more calls to add_rotate_function. In the above example, add_rotate_function registers the function rotate along with a string rotate_descr which is an ASCII description of what the function does. See rotate.c to see how to create the string descriptions? The string can be at most 256 characters. The functions for smooth work analogously.
Testing
To test your functions, open the project file in Visual C++ and build it. You can add this project to a pre-existing (or new) VC++.Net Solution.
GRADING
This is how grading for the exercise will work:
-
Correctness: Your solutions must be 100% correct for any square image matrix with edge dimensions that are a multiple of 32. (The driver program will check correctness and will tell you if a particular implementation is incorrect.
-
Speed improvement: Your solutions will earn credit based on reaching a certain threshold, according to the tables below:
Rotate:
|
Smooth:
|
HINTS
-
The rotate function focuses on spatial locality: because each pixel is used only once, you should focus on using any pixels put into the cache by a previous pixel operation.
-
The smooth function benefits from spatial locality, but also reuses pixels it has read previously. Consequently, you should consider trying to improve temporal locality as well.
-
Try a large number of different functions. There is a FIND_BEST_OF_MANY #define flag in driver.c that can be used to find out which function provides the highest hit rate for each problem.
-
Just because your image is square doesn't mean you have to deal with the image in square pieces.
-
Remember the way things will be laid out in memory and how this affects what is put into the cache.
下面我简单说一下思路,我会再和同学讨论下,然后再给出更完备的解释。
rotate:利用步长修改算法,在最内层将i一步长为4进行循环
smooth:改变循环顺序(i和j)
- exercise5_handout.zip (8.5 KB)
- 下载次数: 93
- Exercise05.zip (59.9 KB)
- 描述: 我的解答
- 下载次数: 209
评论
Total Score: 100/100
Good work.
发表评论
-
Ruby 2.1 GC策略
2014-01-23 11:30 951对象管理主要涉及: Profiling support ... -
Google 持续集成介绍
2014-01-23 11:26 1563见附件PPT. 具体方案 构建描述 依赖分析 ... -
函数式编程 读后感
2013-12-30 15:24 1452一篇比较不错的文章: http://coolshel ... -
系统模块集成管理与版本控制学习
2013-12-27 12:01 1337论软件生命周期集成 http://www.infoq.com ... -
Ruby 动态特性鉴赏
2013-12-26 16:47 1335以下代码与代码学习来自<Ruby Best Prac ... -
Android应用插件化与动态部署 学习
2013-12-26 16:45 0通过REST将相关服务有语义的组合起来。 动态部署: ... -
用Markdown做文档的问题
2013-12-23 18:06 863一直有想一种语言能够解决文档编写问题。 一般文档编写 ... -
Android组件、通信与安全机制学习
2013-12-20 12:26 0现有问题: Android的组件间通信有哪些方法?其中的I ... -
Android root 原理学习
2013-12-15 23:51 2329学习资源: http://www.zhihu.com/qu ... -
global + Ruby
2012-11-16 13:07 1282http://simple-and-basic.com/200 ... -
Linux pthread线程同步相关的API学习
2012-11-12 18:43 1468原因 最近在深入理解Dalvik虚拟机的内部线程控制体系,其 ... -
MMTk代码学习(系统结构与流程)
2012-11-06 19:08 1652MMTk的整体结构和驱动模型主要由Plan, Collecto ... -
MMTk代码学习(RVM接口)
2012-11-06 14:52 1558前导 MMTk被RVM整个封装在后端,主要调用接口是 org ... -
MMTk代码学习(整体结构)
2012-11-05 17:03 2456必要的整体模块 对于一个完整的内存管理工具,主要涉及: ... -
嵌入式Java虚拟机 GC特性一览
2012-10-31 15:53 1297嵌入式Java虚拟机列表来源:http://en.wikipe ... -
Memory Analysis Tool OQL 用例汇总及语法学习
2012-10-28 16:36 2167典型用例 获取所有对象: SELECT * FROM $ ... -
Memory Analysis Tool 使用相关材料整理
2012-10-28 10:47 2009利用MAT分析问题 从转储(Dump)文件中调试并除错 ... -
手机设备操作系统架构图整理
2012-10-28 10:28 1539整体分析材料 Android,ChromeOS, WebO ... -
MMTk特性认识
2012-10-25 16:24 1765整体介绍 MMTk是一个内存管理的工具包 ,同时也是jik ... -
JavaScript V8 引擎相关资料
2012-10-25 14:54 1119V8 Javascript engine之所以快 针 ...
相关推荐
标题 "SSD06 Exercise04 个人解答" 暗示这可能是一个关于软件开发或编程练习的解答,特别是涉及到性能分析或者优化的环节。描述中的 "NULL" 没有提供额外的信息,但我们可以从标签 "源码" 和 "工具" 中推测,这个...
【标题】"SSD06 Exercise02 个人解答"主要涵盖了两个关键概念:源码分析和工具使用。这可能是某个课程或项目练习的一部分,其中作者Qianjigui分享了他在解决特定编程问题或实现某功能时的经验和理解。 在源码分析...
标题“SSD06 Exercise03 个人解答”暗示了一个编程练习或课程作业,其中可能涉及 SSD(固态存储)相关的技术,而 Exercise03 可能是该系列练习中的第三个部分。描述提到的“Ubuntu8.04+Gcc+Gdb”是一个古老的Linux...
NULL 博文链接:https://qianjigui.iteye.com/blog/256678
【标题】"SSD04 Exercise05 个人解答"主要涵盖了两个核心知识点:源码分析和工具使用。在这个练习中,作者分享了他对某个特定编程问题或项目的解答,这通常涉及到深入理解代码的运作机制,包括算法、数据结构以及...
标题“SSD04 Exercise06 个人解答”暗示了一个编程练习或项目,其中涉及到对Microsoft Calendar Control 10.0的使用。这个控制组件通常用于Windows应用程序开发,特别是使用Visual Basic 6 (VB6) 或其他支持ActiveX...
【标题】"SSD04 Exercise03 个人解答"主要涵盖了两个关键概念:源码分析和工具使用。这可能是某个课程或项目中的一个练习,其中"SSD04"可能代表课程编号或者阶段,而"Exercise03"则指示这是第三次实践任务。解答者...
【标题】"SSD04 Exercise04 个人解答"主要涵盖了两个关键知识点:源码理解和工具使用。在这个练习中,作者分享了他们对于特定编程问题的解决方案,可能涉及编程语言的深入理解、代码调试技巧以及如何有效地利用开发...
这是我的解答 博文链接:https://qianjigui.iteye.com/blog/248917
【SSD6 Exercise答案解析】 在信息技术领域,SSD6可能是“Software Engineering Studio 6”的简称,这是一门关于软件工程的课程,常见于大学的国际软件学院中。本课程可能涉及软件开发的全过程,包括需求分析、设计...
【SSD04 Exercise08 个人解答】 在这个学习实践中,我们主要关注的是与源码分析和工具使用相关的知识。这个题目可能源自于一个软件开发或计算机科学的课程,其中"SSD04"可能是课程代码,而"Exercise08"指的是第八个...