0% found this document useful (0 votes)
20 views6 pages

Understanding Garbage Collection Basics

The document discusses garbage collection, an automatic memory management process that reclaims heap-allocated storage no longer in use, thus preventing memory leaks. It explains the mark-and-sweep algorithm, which identifies reachable memory blocks and frees those that are not marked as reachable. The document also highlights the challenges of implementing garbage collection in languages like C and C++, where pointer management is more complex.

Uploaded by

yihuangece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

Understanding Garbage Collection Basics

The document discusses garbage collection, an automatic memory management process that reclaims heap-allocated storage no longer in use, thus preventing memory leaks. It explains the mark-and-sweep algorithm, which identifies reachable memory blocks and frees those that are not marked as reachable. The document also highlights the challenges of implementing garbage collection in languages like C and C++, where pointer management is more complex.

Uploaded by

yihuangece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

[MUSIC].

All right.
now that we know a lot about memory
allocation and we know some of its
basics, we know some details,
implementations and so on.
let's talk about something called garbage
collection.
We're trying to make a lot of this
automatic.
Right.
So far, we have assumed that applications
are going to malloc and then free
whenever they don't need it anymore.
But it turns out that's, that can be a
pain and a big source of, of memory
problems like leaks and so on, okay?
So that's why a garbage collection was
invented, so it's it provides automatic
reclamation, reclamation of heap
allocated storage.
So, for example, suppose that here's a
function foo that has a pointer p.
It allocates this malloc, allocates, and
then when this function returns, this
pointer's going to be gone.
So there's nothing that points to the
allocated block anymore.
That means that the, the block piece is
now garbage, because nobody can reference
it, 'kay?
So and so you can just go and
automatically free it and garbage
collection does that.
This is automatic freeing of blocks that
are no longer useful, that we know are no
longer useful, okay?
So this is very common in functional
languages, scripting languages, and
modern objective-orientated languages.
So like, things like lists, ML, Java,
just common Java, Perl, Mathematica,
Python and so on, 'kay?
So there are also variants of garbage
collectors called conservative garbage
collectors for C and C++.
They had to be conservative because C and
C++ are very flexible in how they support
pointers and then makes implementing
garbage collection hard.
So they have to be conservative, okay?
It works but it can, it can't you might
not be able to collect all garbage.
So let's answer some of the core
questions of garbage collection.
The first one is knowing when memory can
be freed.
So in general, it's very hard to know
what's going to be used in the future,
since it depends on conditions and so on.
So we have to have a way of knowing that
it's never going to be used again.
But so we can know when certain blocks
are no longer reachable.
There's nothing that point to them.
So if, if you have blocks in your heap,
but we know there are no pointers to
those blocks, then we know for sure that
those blocks can be collected.
Okay, can, it can be freed.
But for that, the allocator needs to know
what is a pointer, what's not a pointer.
How can he do this?
Well, it needs language support.
You need to, or you need, you need to be
disciplined on how you use pointers.
Okay?
So there will be some support, either the
program is going to declare what's
pointer, what's that pointer, what the
language supports it and so on, okay?
So, we're going to make some assumptions
about pointers in the rest of this of
this video.
First memory allocator can distinguish
pointers from non-pointers, okay?
So all pointers point to the start of a
block, in the heap, and the application
cannot hide pointers.
For example, it cannot cast them to int
and then back again to hide pointers,
'kay?
So that's what I mentioned before,
disciplined use of pointers, 'kay?
There are many garbage collection
algorithms.
Now, one of the classic point, classic
ones is market sweep, which is going to,
we're going to see in this video now.
it doesn't move blocks around it just
collect them, 'kay?
unless if, if, if you compact, then the
effect is moving it but we're not
going to be looking at that.
So anyways, there is a bunch of of, of
algorithms and note that it started long
time ago, 1960, so it's been, it's been a
problem for a while.
If interesting and learn, we're going to
give you a basic overview in this video.
But if you do want to learn more, there's
this really nice book Garbage Collection:
Algorithms for Automatic Dynamic Memory
by Jones and Lin.
It's a really cool book, if you're
interested in that, I definitely
encourage you to read it.
So let's get started with the basics.
First of all, we're going to look at
memory as a graph, as a directed graph,
'kay?
So, each allocated heap block is a node
in the graph and each pointer is a notch
in the graph.
Okay?
So, and locations that are not in the
heap that contain pointers into the heap
are called root nodes.
They're these things that are outside the
heap.
So these are the heap nodes.
So you do the, the root nodes.
Okay?
So, green nodes here are reachable,
meaning that there is a pointer from the
roots to them, and the other ones are not
the ones we rather not reachable.
'Kay?
So, and we define reachable as follows.
A node is reachable if there's a path
from any root node to to that node, 'kay?
So for example, so this one is green
because there's a path, this one is green
because there's a path, through here.
But now, this one, these ones are not
reachable, because there's no way to get
from any of the root nodes to them, 'kay?
So these ones are, are the ones that
we're going to consider garbage, are the
ones that we're going to collect.
'Kay?
So, let's see it how mark-and-sweep
works.
It's one of the classic, simplest,
algorithms that, that do that.
And that could be built on top of malloc
and free package.
So we, we allocate using malloc, until we
run out of space and then you do garbage
collection.
'Kay?
And when, when you're out of space,
here's what we're going to do.
We're going to use an extra mark bit in
the head of each block.
Okay?
And then we have this mark phase that
starts a lot of the roots and sets the
mark page on each reachable block, 'kay?
And once we do that, over the entire
heap, we can sweep, scan over all blocks
in the heap and free the ones that are
not marked, because we know that those
are not reachable, okay?
So here's an example.
Visual example.
here's what we have before mark.
We have a root another points here, okay?
So from there, we're going to traverse.
Now we, we're going to, we're going to
traverse this and mark the free block.
So from here, we can reach here, so we
mark this one, okay?
So from this one, we can reach this one,
so can mark this one.
And, from so since it points here, from
this one, we can reach here so we marked
this one, 'kay?
Once that's done, we're going to have the
sweep phase, which we know that these
here were not, this was not marked and
this was not marked, so we can go ahead
and free both of them.
Pretty simple, right?
So, let's think of this as a simple
implementation.
But we need assumptions for that.
First, this new, this function new here
with n returns a pointer to a new block
with all locations cleared.
[INAUDIBLE] and read b,i reads location i
of block b into a register, 'kay?
And writes b,i and v, write v into
location i of block b.
So, it does is, it just add, no, location
i to block b.
This is v, okay?
And each block is going to have a header
word that's pointed by b minus 1, 'kay?
So having these restriction here, you
know?
So if, if we're going to use this
function, if applications use this
function, now we can actually keep track
of what is pointers, and where allocation
was being used, and so on, okay?
So the functions used by the garbage
collection is going to be as follows.
We're going to check whether certain p is
a pointer to a block, okay?
So whether something, whether p, so if
you pass p as a parameter, this returns
through if p happens to be a pointer to a
block.
Length of p just tells us the size of the
block not including the header, and get
roots, returns all of the roots in the
system.
Root again, roots are all of the pointers
that are outside the heap, 'kay?
Alright, so let's see how this works.
So, this is the mark phase we started
with we started with the pointer p, 'kay?
So if it's, if p is not a pointer, it
just return, okay?
So also, if its already marked, we return
as well.
Okay?
But now, we go if, if we are here, it,
because it is a pointer hasn't been
marked so we mark it.
And not only that, we're going to go over
the entire block pointed by p and we're
going to go call mark it.
So parts of it are going to be pointers,
parts of this block is not going to be
pointers.
Okay?
So we, we recur, so this is going to mark
the entire graph, so we're going to
traverse entire heap and mark it a, and,
and mark whatever is a pointer and and so
on.
Okay?
Now sweep is going to use length to find
the next block.
Okay?
So we're going to start with p, and we
know, we know where, where it ends.
Okay?
So while p has not reach the end, if the
mark bit is set, we clear the mark bit.
Now, if the allocated bit is set, so but
it, it, if it's not marked, and the
allocate bit is set, we're going to go
and free it.
Because we know it's allocated, but it's
not marked.
It means it's not reachable.
Okay?
Great.
So now we're going to go to, to, to the
next pointer, we're just adding length.
'Kay?
That's pretty cool, pretty simple.
Now, how would we make that work in C?
Well, the challenge of doing this in C is
that, remember that we need this
function, say the, whether certain
location's a pointer.
So in C, anything can be positive.
That, that's a little bit complicated.
So we have to keep track of when you put
pointers.
You have to, to somehow tell a run-time
system that a system location is a
pointer.
Okay?
So but also in C, a pointer can point to,
to the middle of an allocated block.
So that makes it tricky to find allocated
allocated blocks in the mark phase.
Okay.
So the way to solve this as I said by
calling the run-time system, call the
information and so on.
but the, the upshot is that we can make
it work somehow, but it's going to be
conservative.
It might miss, blocks if everything that
it marks, everything that it decides that
can be collected, indeed can be
collected.
However, you might miss some unreachable
blocks and not free those.
So, it's not it might miss something.
But it's it should work reasonably well.
'Kay?
See you soon.

You might also like