{"id":546,"date":"2017-05-26T13:36:57","date_gmt":"2017-05-26T05:36:57","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/seteplia\/?p=546"},"modified":"2019-06-11T20:56:09","modified_gmt":"2019-06-12T03:56:09","slug":"managed-object-internals-part-1-layout","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/premier-developer\/managed-object-internals-part-1-layout\/","title":{"rendered":"Managed object internals, Part 1. The layout"},"content":{"rendered":"<p>The layout of a managed object is pretty simple: a managed object contains instance data, a pointer to a meta-data (a.k.a. method table pointer) and a bag of internal information also known as an object header.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image00210.png\"><img decoding=\"async\" style=\"border: 0px currentcolor;\" title=\"clip_image002\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/clip_image002_thumb9.png\" alt=\"clip_image002\" width=\"640\" height=\"239\" border=\"0\" \/><\/a><\/p>\n<p>The first time I\u2019ve read about it, I\u2019ve got a question: why the layout of an object is so weird? Why a managed reference points into the middle of an object and an object header is at a negative offset? What information is stored in the object header?<\/p>\n<p>When I started thinking about the layout and did a quick research, I\u2019ve got few options:<\/p>\n<ol>\n<li>JVM used a similar layout for their managed objects from the inception.<\/li>\n<li style=\"list-style-type: none;\">It could sound a bit crazy today but remember that C# has one of the worst features of all times (a.k.a. <a href=\"https:\/\/blogs.msdn.microsoft.com\/ericlippert\/2007\/10\/17\/covariance-and-contravariance-in-c-part-two-array-covariance\/\">array covariance<\/a>) just because Java had it back in the day. And compared to that decision, reusing some ideas about the structure of an object doesn\u2019t sound that unreasonable.<\/li>\n<li>Object header can grow in size with no cross-cutting changes in the CLR.<\/li>\n<li style=\"list-style-type: none;\">Object header holds some auxiliary information used by CLR and it is possible that CLR will require more information than a pointer size field. And indeed, .Net Compact Framework used in mobile phones has different headers for small and large objects (see <a href=\"https:\/\/blogs.msdn.microsoft.com\/abhinaba\/2012\/02\/02\/wp7-clr-managed-object-overhead\/\">WP7: CLR Managed Object overhead<\/a> for more details). Desktop CLR never used this ability but it doesn\u2019t mean that it is impossible in the future.<\/li>\n<li>Cache line and other performance related characteristics.<\/li>\n<\/ol>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cbrumme\/\">Chris Brumme<\/a> &#8212; one of the CLR architects, mentioned in the comment on his post \u201c<a href=\"https:\/\/devblogs.microsoft.com\/cbrumme\/value-types\/\">Value Types<\/a>\u201c that cache friendliness is the very reason for the managed object layout. It is theoretically possible that due to cache line size (64 bytes) it will be more efficient to access fields that are closer to each other. This means that dereferencing method table pointer with the following access to some field should have some performance difference depending on the location of the field inside the object. I\u2019ve spent some time trying to proof that this is still true for modern processors but was unable to get any benchmarks that showed the difference.<\/p>\n<p>After spending some time trying to validate my theories, I\u2019ve contacted <a href=\"https:\/\/blogs.msdn.microsoft.com\/vancem\/\">Vance Morrison<\/a> asking this very question and got the following answer: current design was made with no particular perf considerations.<\/p>\n<p>So, the answer to the question \u2013 \u201cWhy the managed object\u2019s layout is so weird?\u201d, is simple: \u201chistorical reasons\u201d. And, to be honest, I can see a logic for moving object header at a negative index to emphasize that this piece of data is an implementation detail of the CLR, the size of it can change in time, and it should not be inspected by a user.<\/p>\n<p>Now, it\u2019s time to inspect the layout in more details. But before that, let\u2019s think about, what extra information CLR can be associated with a managed object instance? Here are some ideas:<\/p>\n<ul>\n<li>Special flags that GC can use to mark that an object is reachable from application roots.<\/li>\n<li>Special flag that notifies GC that an object is pinned and should not be moved during garbage collection.<\/li>\n<li>Hash code of a managed object (when a <b>GetHashCode<\/b> method is not overridden).<\/li>\n<li>Critical section and other information used by a lock statement: thread that acquired the lock etc.<\/li>\n<\/ul>\n<p>Apart from instance state, CLR stores a lot of information associated with a type, like method table, interface maps, instance size and so on, but this is not relevant for our current discussion.<\/p>\n<h4>IsMarked flag<\/h4>\n<p>Managed object header is a multi-purpose chameleon that can be used for many different purposes. And you may think that the garbage collector (GC) uses a bit from the object header to mark that the object is references by a root and should be kept alive. This is a common misconception, and few very famous books are to blame (*).<\/p>\n<p>(*) Namely \u201cCLR via C#\u201d by Jeffrey Richter, \u201cPro .NET Performance\u201d by Sasha Goldstein at al and, definitely, some others.<\/p>\n<p>Instead of using the object header, the CLR authors decided to use one clever trick: the lowest bit of a method table pointer is used to store a flag during garbage collection that the object is reachable and should not be collected.<\/p>\n<p>Here is an actual implementation of \u2018mark\u2019 flag from the coreclr repo, file <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/a6c2f7834d338e08bf3dcf9dedb48b2a0c08fcfa\/src\/gc\/gc.cpp\">gc.cpp<\/a>, lines 8974 (**):<\/p>\n<pre class=\"lang:default decode:true\">#define marked(i) header(i) -&gt; IsMmarked();\r\n#define set_marked(i) header(i)-&gt;SetMarked()\r\n#define clear_marked(i) header(i)-&gt;ClearMarked()\r\n \r\n\/\/ class CObjectHeader\r\nBOOL IsMarked() const\r\n{\r\n    return !!(((size_t)RawGetMethodTable()) &amp; GC_MARKED);\r\n}\r\nvoid ClearMarked()\r\n{\r\n    RawSetMethodTable(GetMethodTable());\r\n}\r\nvoid SetMarked()\r\n{\r\n    RawSetMethodTable((MethodTable*)(((size_t)RawGetMethodTable()) | GC_MARKED));\r\n}\r\nMethodTable* GetMethodTable() const\r\n{\r\n    return((MethodTable*)(((size_t)RawGetMethodTable()) &amp; (~(GC_MARKED))));\r\n}<\/pre>\n<p>(**) Unfortunately, the <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/a6c2f7834d338e08bf3dcf9dedb48b2a0c08fcfa\/src\/gc\/gc.cpp\">gc.cpp<\/a> file is so big that github refuses to analyze it. This means that I can\u2019t add a hyperlink to a specific line of code.<\/p>\n<p>Managed pointers in a CLR heap are aligned on 4-byte or 8-byte address boundaries depending on a platform. This means that 2 or 3 bits of every pointer are always 0 and can be used for other purposes. The same trick is used by JVM and called <a href=\"http:\/\/docs.oracle.com\/javase\/7\/docs\/technotes\/guides\/vm\/performance-enhancements-7.html\">\u2018Compressed Oops\u2019<\/a> \u2013 the feature that allows JVM to have 32 gigs heap size and still use 4 bytes for managed pointer.<\/p>\n<p>Technically speaking, even on a 32-bit platform there is 2 bits that can be used for flags. Based on a comment from the <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/a6c2f7834d338e08bf3dcf9dedb48b2a0c08fcfa\/src\/vm\/object.h#L642\">object.h<\/a> file we can think that this is indeed the case and the second lowest bit of the method table pointer is used for pinning (to mark that the object should not be moved during compaction phase of garbage collection). Unfortunately, it is not clear, is true or not, because <b>SetPinned<\/b>\/<b>IsPinned<\/b> methods from the <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/a6c2f7834d338e08bf3dcf9dedb48b2a0c08fcfa\/src\/gc\/gc.cpp\">gc.cpp<\/a> (lines 3850-3859) are implemented based on a reserved bit from the object header and I was unable to find any code in the coreclr repo that actually sets the bit of the method table pointer.<\/p>\n<p><b>Next time<\/b> we\u2019ll discuss how locks are implemented and will check how expensive they are.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The layout of a managed object is pretty simple: a managed object contains instance data, a pointer to a meta-data (a.k.a. method table pointer) and a bag of internal information also known as an object header. The first time I\u2019ve read about it, I\u2019ve got a question: why the layout of an object is so [&hellip;]<\/p>\n","protected":false},"author":4004,"featured_media":37840,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[6700],"tags":[6695],"class_list":["post-546","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-net-internals","tag-seteplia"],"acf":[],"blog_post_summary":"<p>The layout of a managed object is pretty simple: a managed object contains instance data, a pointer to a meta-data (a.k.a. method table pointer) and a bag of internal information also known as an object header. The first time I\u2019ve read about it, I\u2019ve got a question: why the layout of an object is so [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/546","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/users\/4004"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/comments?post=546"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/546\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media\/37840"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media?parent=546"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/categories?post=546"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/tags?post=546"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}