from: http://blog.bwhiting.co.uk/?p=341
Stage3D optimisation
Having been playing with Stage3D for a while now, I though I would write a small piece on optimisation.
With great power comes great responsibility!
Stage3D give you GPU access, which can expose some serious rendering horsepower, but if you don’t treat it with respect your going to find you run into limitations pretty quick!
So what follows is a rough (very) guide on how to squeeze the most out of the new 3D apis.
Rule 1 (of 1)
CPU’s are fast, GPU’s are faster, communication between the two however is probably the biggest bottleneck you will face.
Therefore: Reduce this wherever possible.
This means minimise calls to the following Context3D functions;
Context3D.drawTriangles();
Context3D.setProgram();
Context3D.setProgramConstants…();
(actually most of Context3D’s functions but the above are the real doozies)
drawTriangles()
GPU’s can draw triangles fast, and lots of them, millions of them every second without breaking into a sweat.
So you might think, “I can call drawTriangles() 50,000* times no sweat as long as I am only drawing a few triangles in each call.”.
WRONG!! This command is a mighty expensive one so use it very wisely!
How: When you call drawTriangles you pass it a vector of ids that, per three, represent one triangle. Given that this call is expensive it then makes sense that you pass it as many triangles as possible in one call. Sadly this doesn’t quite mean you can just group your geometry into big chunks as you cannot change state (alter the material or any parameters) during this call meaning everything that is sent through will be rendered with the same program and set of constants. It does mean however that static (non moving elements) that share the same program/material, can be combined into one list. Things such as trees, grass and any other repeatable geometry are good candidates for this. You can do this for dynamic geometry also but it gets complex as you have to upload transformation data in a separate buffer, this is one way particle systems can be created. The downside is that each time there is a change to any of the objects the whole buffer will need to be re-uploaded. It is also vital that you only try and draw objects that will be seen on screen, so don’t draw that which is out of view of the camera -> frustum culling saves the day.
side note:
Even high end games rarely want to be issuing more than 1000-2000 draw calls, but the likes of battlefield 3 can get up-to the 3000 mark in some of the environments. Newer consoles however, can issue over 10,000 draw calls and do it much faster due to a more direct access to the hardware.
setProgram()
Assuming you have now done everything in your power to reduce the number of draw calls you issue, next thing to look at is state changes (changing the current program).
Changing state on the GPU might seem like a trivial thing but it is actually something you want to keep to a minimum to be able to squeeze the most out of your graphics card.
There are a few things you can do here to reduce this problem.
1. Group the objects that require drawing by their material/program! For example suppose you had 100 cubes, 50 of them with one material and 50 of them with another. Now if you had a list containing all of those cubes and blindly sent them to be rendered you could end up having to change state a large number of times. If it so happened that each cube in the list had a different material to the object before it, then the program will have to be updated for every draw call. Not good. If that list however was sorted so that
- even if you are only drawing one triangle with this call there
*there is an actual limit of 32,768 drawTriangles() calls per present() call.
setProgramConstants…()
This function is what allows us to upload constants to the gpu. It is how we upload our matrices and any float1/2/3/4..s that we want to utilise in our shaders. While it may not be a huge bottleneck it still has a noticeable impact on performance in my experiments.
So how to optimise?
Any constant that is likely to be reused by different materials, then upload only once per frame not once per object rendered. So what are the likely culprits?
The view projection matrix! This is 16 float values that will not change between objects so it makes sense to upload it once! 16 numbers vs 16,000 for a 1000 objects and 999 less calls to setProgramConstants, and that is a good thing!
The same applies to anything else that will not vary between objects, camera and light positions or common numbers used in shaders (0,0.5,1,-1..).
What this shows is that it is important to have some sort of system to manage uploads so you can keep track of what is already uploaded and only upload data that isn’t already there!
side note:
This also translates into how you write shaders, knowing that each constant requires an upload should make you rethink sometimes about how to achieve something whilst using minimal constants, take unpacking a normal from a texture. Usually you would multiple the value from the texture by 2 then subtract one (2 floats required) but the same result can also be achieved with a subtract by 0.5 then a divide 0.5 (1 float required). Perhaps not the best example but I am sure you get the idea. REUSE is your friend!
—————————————–
While I only focused on 3 methods of the context3d, almost all of them will incur some penalty but those highlighted are the ones I have found to be a more serious problem.
Quick additions:
Drawing to a bitmapdata from the gpu is slow, so if you have to do it, ensure it is at a small a resolution as possible, in theory a 1×1 pixel readback should be big enough for picking!
Resizing the back buffer is slow!
Creating textures is slow, don’t do it on the fly. Pre allocate if possible then pick from a pool (more relevant for post processing).
At some point in the future, I hope to write some test examples that highlight the cost of the functions mentioned above (have already done quite a few but they have been tied into other things rather than dedicated standalone tests).
I hope all of that makes some form of sense to someone If you have any additions, corrections or questions… fire away. Will probably update this from time to time to add in more that I have missed out, it’s a broad area with many possible optimisations!
Super Quick Summary:
REDUCE DRAW CALLS! Group items where possible and use culling to ensure you are only drawing what is neccessary.
REUSE MATERIALS and BATCH RENDERING by material if you can.
UPLOAD THE MINIMUM NUMBER OF CONSTANTS you can get away with
相关推荐
'UE MDI 210 - Optimisation et analyse numérique contient les parties suivantes : - l'analyse numérique, avec d'une part la résolution des systèmes linéaires et, d'autre part, la recherche des ...
粒子群优化算法(Particle Swarm Optimization, PSO)是一种在复杂多维空间中寻找全局最优解的仿生优化算法,源于对鸟群或鱼群集体觅食行为的研究。它由James Kennedy和Russell Eberhart在1995年提出,由于其简单易...
Cuckoo optimisation algorithm
《Trading Systems A new approach to system development and portfolio optimisation》这本书是一本关于量化交易系统的入门书籍,旨在帮助读者快速理解量化交易的基本原理,并详细介绍量化交易系统的运行机制。...
标题“Des.rar_ANT optimisation”和描述“Program ant colony optimisation”揭示了这个压缩包文件内容与蚂蚁优化算法(Ant Colony Optimisation, ACO)有关,这是一种基于生物启发式算法的计算方法,常用于解决...
Grasshopper Optimisation Algorithm (GOA),蝗虫算法是 由 Saremi 等[1]于2017 年提出的一种元启发式仿生优化算法,具有较高的搜索效率和较快的收敛速度,且算法本身特殊的自适应机制能够很好地平衡全局和局部搜索...
该文章对于想学习鸟群算法的人来说是很不错的入门级读物
标题中的"fichier5.rar_optimisation"提示我们这是一个与优化相关的RAR压缩文件,可能是某个程序或数据集的优化版本。描述中的"programme d'optimisation n"进一步证实了这一点,意味着这可能是一个用于优化目的的...
this book presents new methodologies to improve power plants' efficiency, by using automatic control algorithms. This will lead to an improvement in the generation of companies' profit and also in the...
optimisation_v20232024a.ipynb
The proposed Grasshopper Optimisation Algorithm (GOA) mathematically models and mimics the behaviour of grasshopper swarms in nature for solving optimisation problems.
Dr_Bob's Delphi Efficiency Optimisation
《Radio Network Planning and Optimisation for UMTS (Second Edition)》是诺基亚工程师撰写的一本深入探讨UMTS(第三代移动通信系统)无线网络规划与优化的专业书籍。这本书旨在为读者提供全面的理解和实践指导,...
iL existe plusieurs leviers permettant une optimisation maximale des systèmes industriels qui se rangent dans la discipline de la sécurité de fonctionnement : fiabilité, maintenabilité, sécurit...
Random Field Optimisation using Graph Cuts Submodular vs. Non-Submodular Problems Pairwise vs. Higher Order Problems 2-Label vs. Multi-Label Problems Recent Advances in Random Field Optimisation ...
/20200407/f9dbb4922612dc10ab505e4ebae67c90.rar
Article: PENLAB: A MATLAB Solver For Nonlinear Semidefinite Optimisation Jan Fiala Michal Kocvara and Michael Stingl
本文档将基于“Low level GLSL Optimisation.pdf”提供的信息,深入探讨低级别GLSL优化的核心概念和技术。 #### 二、低级别优化技术 ##### 2.1 PowerVR Rogue USC PowerVR Rogue架构下的通用着色引擎(USC)是...
Supply Chain Management in the Fashion Retail Industry a multi-method approach for the optimisation of performances