August 19, 2009
Posted by Bartosz Milewski under C++, Concurrency, D Programming Language, Multithreading, Programming
[22] Comments
What is there to reference counting that is not obvious? In any language that supports deterministic destruction and the overloading of the copy constructor and the assignment operator it should be trivial. Or so I though until I decided to implement a simple ref-counted thread handle in D. Two problems popped up:
How does reference counting interact with garbage collection?
How to avoid data races in a multithreaded environment?
In purely garbage-collected languages, like Java, you don’t implement reference counting, period. Which is pretty bad, if you ask me. GC is great at managing memory, but not so good at managing other resources. When a program runs out of memory, it forces a collection and reclaims unused memory. When it runs out of, say, system thread handles, it doesn’t reclaim unused handles–it just dies. You can’t use GC to manage system handles. So, as far as system resources go, Java forces the programmer to use the moral equivalent of C’s malloc and free. The programmer must free the resources explicitly.
In C++ you have std::shared_ptr for all your reference-counting needs, but you don’t have garbage collection for memory–at least not yet. (There is also the Microsoft’s C++/CLI which mixes the two systems.)
D offers the best of both worlds: GC and deterministic destruction. So let’s use GC to manage memory and reference counting (or other policies, like uniqueness) to manage other limited resources.
First attempt
The key to reference counting is to have a “value” type, like the shared_ptr in C++, that can be cloned and passed between functions at will. Internally this value type must have access to a shared chunk of memory that contains the reference count. In shared_ptr this chunk is a separately allocated integral type–the counter. The important thing is that all clones of the same resource share the same counter. The counter’s value reflects how many clones there are. Copy constructors and assignment operators take care of keeping the count exact. When the count goes to zero, that is the last copy of the resource goes out of scope, the resource is automatically freed (for instance, by calling CloseHandle).
In my first attempt, I decided that the memory allocated for the counter should be garbage-collected. After all, the important thing is to release the handle–the memory will take care of itself.
struct RcHandle {
shared Counter _counter; // GC'd shared Counter object
HANDLE _h;
~this() { // destructor
if (_counter.dec() == 0) // access the counter
CloseHandle(_h);
}
// methods that keep the counter up to date
}
RcHandle is a struct, which is a value type in D. Counter is a class, which is a reference type; so _counter really hides a pointer to shared memory.
A few tests later I got a nasty surprise. My program faulted while trying to access an already deallocated counter. How did that happen? How could garbage collector deallocate my counter if I still had a reference to it?
Here’s what I did in my test: I embedded the ref-counted handle inside another garbage-collected object:
class Embedder { // GC'd class object
RcHandle _rc;
}
When the time came to collect that object (which happened after the program ran to completion), its finalizer was called. Whenever an object contains fields that have non-trivial destructors, the compiler generates a finalizer that calls the destructors of those embedded objects–_rc in this case. The destructor of the ref-counted handle checks the reference count stored in the counter. Unfortunately the counter didn’t exist anymore. Hours of debugging later I had the answer.
What happened is that the garbage collector had two objects on its list: the embedder and the counter. It just so happened that the collector decided to collect those two objects in reverse order: the counter first, then the embedding object. So, by the time it got to the finalizer of the embedding object, the counter was gone!
What I discovered (with the help of other members of the D team who were involved in the discussion) was that there are some limitations on mixing garbage collection with deterministic destruction. There is a general rule:
An object’s destructor must not access any garbage-collected objects embedded in it.
Since the destructor of the ref-counted handle must have access to the counter, the counter must not be garbage-collectible. That means only one thing: it has to be allocated using malloc and explicitly deallocated using free. Which brings us to the second problem.
Concurrency
What can be simpler than an atomic reference count? On most processors you can atomically increment and decrement a memory location. You can even decrement and test the value in one uninterruptible operation. Problem solved! Or is it?
There is one tiny complication–the location that you are atomically modifying might disappear. I know, this is totally counter-intuitive. After all the management of the counter follows the simple rule: the last to leave the room turns off the light. If the destructor of RcHandle sees the reference count going from one to zero, it knows that no other RcHandle has access it, and it can safely free the counter. Who can argue with cold logic?
Here’s the troubling scenario: RcHandle is embedded in an object that is visible from two threads:
class Embedder {
RcHandle _rcH;
}
shared Embedder emb;
Thread 1 tries to overwrite the handle:
RcHandle myHandle1;
emb._rcH = myHandle1;
while Thread 2 tries to copy the same handle to a local variable:
RcHandle myHandle2 = emb._rcH;
Consider the following interleaving:
T2: Load the address of the _counter embedded in _rcH.
T1: Swap emb._rcH with myHandle
T1: Decrement the counter that was embedded in _rcH. If it’s zero (and it is, in this example), free the counter.
T2: Increment the _counter. Oops! This memory location has just been freed.
The snag is that there is a window between T2 reading the pointer in (1), and incrementing the location it’s pointing to in (4). Within that window, the reference count does not match the actual number of clients having access to the pointer. If T1 happens to do its ugly deed of freeing the counter within that window, the race may turn out deadly. (This problem has been known for some time and there were various proposals to fix it, for instance using DCAS, as in this paper on Lock-Free Reference Counting.)
Should we worry? After all the C++ shared_ptr also exposes this race and nobody is crying havoc. It turns out that it all boils down to the responsibilities of the shared object (and I’m grateful to Andrei for pointing it out).
A shared object should not willy-nilly expose its implementation to clients
If the clients of Embedder want access to the handle, they should call a synchronized method. Here’s the correct, race-free implementation of the Embedder in the scenario I just described.
class Embedder {
private:
RcHandle _rcH;
public:
synchronized RcHandle GetHandle() const { return _rcH; }
synchronized void SetHandle(RcHandle h) { _rcH = h; }
...
}
The method GetHandle copies _rcH and increments its count under the Embedder’s lock. Another thread calling SetHandle has no chance of interleaving with that action, because it is forced to use the same lock. D2 actually enforces this kind of protection for shared objects, so my original example wouldn’t even compile.
You might be thinking right now that all this is baloney because I’m trying to fit a square peg into a round hole. I’m imposing value semantics on a non-atomic object, and you cannot atomically overwrite a non-atomic object. However, using this logic, you could convince yourself that a double cannot have value semantics (on most common architectures doubles are too large to be atomic). And yet you can safely pass doubles between threads and assign them to each other (which is the same as overwriting). It’s only when you embed a double inside a shared object, you have to protect it from concurrent access. And it’s not the double protecting itself–it’s the shared embedder that is responsible for synchronization. It’s exactly the same with RcHandle.
Conclusions
Just when you thought you knew everything about reference counting you discover something you haven’t though about. The take-home message is that mixing garbage collection with deterministic destruction is not a trivial matter, and that a race-free reference-counted object is vulnerable to races when embedded it in another shared object. Something to keep in mind when programming in D or in C++.
分享到:
相关推荐
The Anatomy of a Large-Scale Hypertextual Web Search Engine(论文中英文对照),google创始人写的关于google技术的论文,适合SEO和研究搜索技术的用户
SIFT算法利用高斯差分(Difference of Gaussians,DoG)运算在金字塔结构中进行特征点的检测。 SIFT算法检测的特征点主要是一些具有blobs(斑点)形状的局部极值点。对于检测到的每个特征点,算法会计算其在图像中...
文章《The Anatomy of a Large-Scale Hypertextual Web Search Engine》(以下简称《Anatomy》)深入介绍了Google搜索引擎的设计理念和技术细节,这是一款能够高效抓取网页并建立索引的大型搜索引擎。 #### 二、...
《大规模超文本网络搜索引擎的解剖》 这篇论文由Sergey Brin和Lawrence Page撰写,两位作者来自斯坦福大学计算机科学系。他们的研究工作主要集中在如何利用超文本中的结构来构建一个大规模的搜索引擎——Google。...
### 大规模超文本网络搜索引擎解析 #### 一、引言与背景 互联网的迅猛发展对信息检索领域提出了全新的挑战。随着网页数量的急剧增长,以及越来越多的新用户加入到网络世界,如何高效地管理和检索这些海量信息变得...
1998年,谷歌创始人Sergey Brin和Lawrence Page发表了《The_Anatomy_of_a_Large-Scale_Hypertextual_Web_Search_Engine》一文,详细介绍了Google搜索引擎的设计理念与核心技术。这篇文章不仅对搜索引擎领域产生了...
Fundamentals of Anatomy and Physiology for Nursing and Healthcare Students is a succinct but complete overview of the structure and function of the human body, with clinical applications throughout....
Describes and illustrates the important aspects of anatomy, physiology, and pathophysiology.Clearly organized and written, this guide is ideal for students with a limited background in science....
Blood Supply of the Stomach Complete Anatomy
Empirical Study of the Anatomy of Modern SatSolversHadi Katebi1, Karem A. Sakallah1, and João P. Marques-Silva21 EECS Department, University of Michigan {hadik,karem}@umich.edu2 CSI/CASL, University...
The study of human anatomy really comes to life in the anatomy laboratory. Here is where students get hands-on experience with human cadavers and bones, classroom models, preserved and fresh animal ...
标题《The Anatomy of BackboneJS》表明本文主要介绍Backbone.js的内部结构和原理。Backbone.js是一个轻量级的JavaScript库,它提供了一种将数据模型(models)、视图(views)、和路由(routes)组织到单页应用中的...
...The book can be used as a learning tool for residents and as a great reference for fellows and practicing radiologists" - The Bookshelf February 2011, Vipul Sharma, MD Presented by a team of ...
7. 靶向受体(Targeted Receptor):在细胞表面表达的特定受体,能够识别并结合靶向配体,如抗体、生长因子、多肽等。在本文中,研究了表达Tie2受体的人脐静脉内皮细胞与表面结合PH1多肽的脂质体之间的相互作用,...