Creating memory leak test scripts - take 3

I made an obvious error in my previous round of experiments. We're going to start afresh.

Circular references - leaking

First the obvious: if you create circular references (node A refers to node B, and node B refers back to node A), you get memory leaks.

The failure of my first script is a problem. I often use circular references between two HTML objects, for instance a header and a div that should open when the user clicks on the header. Using such references makes for far simpler scripts, since I don't have to find the related element every time I need it.

Event handling

We're going to set event handlers on the 10,000 links in this slightly odd way. IE leaks like a sieve. (Thanks to Tino Zijdel for the example)

Every click event handler refers to obj, and JavaScript's black magic makes sure that when a link is clicked obj still means the same as when the function was defined: the current link.

Of course we never actually define event handlers in this odd way. Instead we'd use the this keyword. Rather to my surprise, my fourth test also causes memory leaks, something I thought the this keyword was immune to. Apparently I was wrong.

The solution is not to use a separate function for the assignment of the event handler. The two scripts below don't cause memory leaks, and these are more normal ways of setting event handlers.

Since these two last scripts are in line with common ways of setting event handlers, while the leaking scripts are very odd and non-real-life, I don't see the problem with event handling. As long as you use simple, common ways of setting event handlers, what can go wrong?

To Do: Find a real-life way of setting event handlers that causes memory leaks.

Of course, if anyone has a useful clue for one of my two To Do points, please comment.

Comments

1 Posted by Analgesia on 20 October 2005 | Permalink

please note that in your testpages the nextsibling of an anchor is also an anchor.
so in the first loop A refers to B and B to A.
in the second loop B refers to C and C to A.
So in effect B will no longer refer to A but (only) to C.
So the only circular reference will be Y to Z and Z to Y (where Y is the last link (with a nextSib.))

2 Posted by Tino Zijdel on 20 October 2005 | Permalink

It doesn't seem to be the 'this' keyword that is actually responsible for the memory-leak since this leaks as well:

function createNewClick(obj)
{
obj.onclick = function() { alert('clicked'); }
}

Furthermore I indicated in one of my last posts that referencing a child object also causes memoryleaks; I will need to make that more specific since this test doesn't seem to leak: http://therealcrisp.xs4all.nl/upload/leaktest3.html , wereas this one does: http://therealcrisp.xs4all.nl/upload/leaktest4.html
The only difference in the second example is that I'm referencing a textNode instead of an elementNode.

3 Posted by Tino Zijdel on 20 October 2005 | Permalink

Anagesia: please note the i+=2 in the for-construct ;)

As to a safe way of creating circular references; I normally just write a function that removes the references onunload...

4 Posted by Lon on 20 October 2005 | Permalink

The leaking scripts are absolutely not odd or non-real-life. That's the way we usually write code at Q42.

You just have to understand how the memory leak is working. There's plenty of pages on the net explaining the problem. Here's one: http://msdn.microsoft.com/library/en-us/ietechcol/dnwebgen/ie_leak_patterns.asp?frame=true

Read it carefully and it should all become clear.

Furthermore why create 10.000 links... look at Laurens' code. Just append a 10MB string to one html element and you'll see 10MB leaking away each reload. You only need the document.body this way. This makes for far smaller, faster and better understandable test scripts. How to get a 10MB string? var s = new Array(1000).join(new Array(1000).join("1234567890"));

Laurens has the perfect solution to leakage caused by attaching events. Look at his website.

5 Posted by Sjoerd Visscher on 20 October 2005 | Permalink

Note that what is inside the function is completely irrelevant to memory leaks. It is about the values of the variables "around" the function (in scope of the function) at the time of unload.

6 Posted by Peter Siewert on 20 October 2005 | Permalink

As Sjoerd said, in test4, the leak is due to scope. You have a DOM object as an available variable when you create a new function. And that function is attached to the same DOM object, creating a circular reference. When the anonymous function is created, all the variables in the local scope are stored for that function's scope (even though they are not referenced by that function). This not only affects function parameters, but also local variables created in the same function that the new anonymous function was created.
Unused variables are stored because of javascripts dynamic nature. If the onclick function were alert("This is a "+eval("obj.nodeName")) it would still remember the value of obj (as well it should) even though there is no direct reference to the obj parameter.

7 Posted by Peter Siewert on 21 October 2005 | Permalink

Oops. I forgot address your to-dos:
1. Whenever I need to create a circular style reference between two DOM elements, I will ensure they each have an id attribute (and generate a random one if they do not), then store that id string in the other element. You can get the actual reference using document.getElementById(this.relatedElementId)

2. If you are creating a DOM element and assigning newly created anonymous functions to it within the same function, you will cause a leak: http://www.geocities.com/petersiewertweb/scopeLeak.html

8 Posted by Florent Guillaume on 21 October 2005 | Permalink

Note that the fact that circular references create memory leaks is just a sign of a poor implementation. There are several well-known techniques to do garbage collection in the presence of cycles, some have been known for more than twenty years, for instance mark and sweep.

It would be interesting to find a javascript implementation that actually uses these techniques.

9 Posted by liorean on 21 October 2005 | Permalink

In my experience, the cause of these leaks is always that one JS property on a COM object references another COM object which has a similar reference. Doesn't have to be circular, just has to cross the engine borded more than once.

As for the fourth case causing a memory leak, let's see what does that:

createNewOnclick function takes a named argument obj. Named arguments are local variables, thus the function literal inside it will close over obj, creating your circular reference.

One way of getting getting around this problem is by externalising all references. Instead of the event handler containing a reference, let it search for the reference in a globel instead.

10 Posted by liorean on 21 October 2005 | Permalink

Ah, some corrections... s/borded/border/ and s/createNewOnclick/createNewClick/

Florent: Well, the fact is, most browsers use a mark and sweep garbage collector. But the problem really doesn't lie in the garbage collector, but in the fact that there are foreign objects that are not subject to that garbage collector. Gecko has no less than four different garbage collecting systems in the browser, but has far less leaks than Trident. The problem in Trident (as well as WebKit it would seem, another very leaky engine) is that there are very frequent crosses between objects governed by different garbage collectors. Each garbage collector by itself can handle it's own circular references, and even in simple cases of crossing the boundaries. But not in complex cases where the boundaries are crossed multiple times.

11 Posted by James Mc Parlane on 21 October 2005 | Permalink

Reference x in your closure, not 'this'. I suspect that 'this' is evaluated at runtime wrt the function, x will be the classic closure variable reference and because x is a DOM element, it should form a nice circular reference.

12 Posted by James Mc Parlane on 21 October 2005 | Permalink

That's should have been 'reference x[i]' and it was in reply to the first function in your second To-do

You will get a leak if you create a real closure, not just a local function declaration.

A closure needs to reference a variable outside the function scope.

In IE - the blind spot for the GC seems to be the DOM. So the circular reference you create needs to include a DOM element.

Expando is too handy to have to worry about leaks. I want a flexible and free object model where I can create any object in any other object and not have worry.

So my approach is to use my own garbage collector, which you probably saw in my entry. I garbage collect elements with listeners that are not attached to the DOM, which goes to the heart of the problem in IE.

I see that nobody else has tried the test case of what happens when you delete elements with listeners using the two ways of deleting.

In answer to your first To-Do - one way to create safe circular references is to implement a smart GC that can find them or register objects of interest with a GC or use a naming standard that allows a full scan GC to find your objects quickly.

http://blog.metawrap.com/blog/MyEntryForTheQuirksModeAddEventRecodingContest.aspx

13 Posted by Erik Arvidsson on 21 October 2005 | Permalink

Why doesn't the last 2 leak? Is it because the closure references x instead of x[i]? The closure definately references a COM object (the NodeList)

14 Posted by Erik Arvidsson on 21 October 2005 | Permalink

I meant test5.html. (test6 is ok)

It actually leaks one NodeList (x) per time. This is not a lot of memory but letting it run for a few minutes shows this.

15 Posted by Laurens van den Oever on 21 October 2005 | Permalink

PPK: TODO2: I modified your test5.html script slightly and real worldly to make it leak:

http://laurens.vd.oever.nl/weblog/memoryleaks/

Yesterday I posted a generic fix to the problem (TODO1):

http://laurens.vd.oever.nl/weblog/items2005/closures/

16 Posted by Laurens van den Oever on 21 October 2005 | Permalink

Erik: I don't see that memory leak. Not even if I add a bigString to it. So to me it seems that IE can free references to NodeLists.

17 Posted by James Mc Parlane on 21 October 2005 | Permalink

Its all about scope. You can make it leak one object easily, but it has to be a real closure. If you want it to leak badly then you want to leak one object per closure and element so you need a new variable in a new scope for each closure and an effective way to do that is to call a function with a parameter, and reference that parameter in the closure.

The closure will reference an object that has been on the stack and then pushed off the stack, so it can't be re-assigned by name in another scope.

so the following code grafted into the example

function init()
{
var T1 = (new Date()).getTime();

createLinks();

var x = document.getElementsByTagName('a');

for (var i=0;i<x.length;i++)
{
var l_e = x[i];
assignListener(l_e);
}
alert("it took " + ((new Date()).getTime() - T1) / 1000 + " seconds to render this\r\nHit F5 a few times and see what happens to the render time.");

}

function assignListener(p_e)
{
p_e.onclick = function ()
{
p_e.firstChild.nodeValue = ' Clicked! - ';
}
}

Should leak rather badly.

I use the render time to measure the internal complexity of the browser. Click f5 a few times and if I have done my sums right - the times should get longer :)

18 Posted by James Mc Parlane on 21 October 2005 | Permalink

Drip may be flawed. I tried my script with Drip and tracked no leak which is odd, because the theory is sound...

But lo and behold, the render timing and task manager tells another story.

Load the link below and click F5 a few times and see what happens to the render time and memory usage.
http://test.metawrap.com/javascript/tests/fundamental/test_35_closure_leak.html

19 Posted by Tino Zijdel on 21 October 2005 | Permalink

Drip is indeed not perfect; sometimes it shows references that are actually nicely automatically cleaned by the browser and on the other hand it fails to show memory leaks due to closures. Blow memory is a nice indication though...

20 Posted by Ismael Jurado on 21 October 2005 | Permalink

There's an article by Mishoo in http://www.bazon.net/mishoo/articles.epl?art_id=824 that explains very clearly why IE leaks memory and how to avoid it.

I find it ver usefull!

21 Posted by James Mc Parlane on 22 October 2005 | Permalink

I've collected together some of my ideas on the topic on closure leaks in IE.

http://blog.metawrap.com/blog/IEClosuresLeaks.aspx

22 Posted by Jimmi Thøgersen on 22 October 2005 | Permalink

"Drip may be flawed. I tried my script with Drip and tracked no leak which is odd, because the theory is sound..."

Indeed, looking at the source, Drip will only take care of leaked DOM elements that have been created using document.createElement after page load.

Since all the elements you attached as children of the document have been created using cloneNode, it won't be able to track them.

I'm hardly an expert on the DOM - or the MS WebBrowser2 ActiveX control - and this whole leak issue is driving me insane already. But I'd think it would be possible to extend Drip in a lot of ways - so here's the idea for someone adequately talented:

If you can replace document.createElement on pageload through the ActiveX control, surely you could also traverse the already existing DOM of the static HTML to get references to the entire DOM tree - and then replace all functions that create new DOM nodes in a similar way to the replacement of createElement. I can't think of a way to replace, say, properties that modify the DOM tree - such as innerHTML - though ;)

The end result would be that Drip had a record of the entire DOM and could then find every single DOM object that wasn't released.

23 Posted by Joost Diepenmaat on 22 October 2005 | Permalink

@Florent Guillaume:
Reference counting collectors can be quite useful: they're probably the fastest way of doing garbage collection, and you can have a guaranteed time of destruction. See for example Perl as an implementation that works quite well.

The big problem with reference counting in javascript is that, as far as I know, there is no way of adding destructors to arbitrary objects, which makes it a lot harder to seemlessly clean up self-referencing collections.

Interestingly, IE's reference-counting scheme does not apply to "pure" js objects, only to DOM objects (maybe because COM uses reference-counting? I don't know). At least this code does not seem to leak at all:

http://zeekat.nl/circ-refs.html

24 Posted by Jimmi Thøgersen on 22 October 2005 | Permalink

Small (well, lengthy) addendum... When I reached 1250, I started cutting out bits that in hindsight were rather important. The reason I say "If you can replace document.createElement..." is that that's the approach of Drip - it adds a bit of temporary javascript to the page, which replaces document.createElement with a javascript function that sends the element back to Drip.

So, that javascript function could be extended to replace other functions etc. But a possibly better approach:

Since there are some DOM Inspectors for IE outthere (and some of them open source), it should be possible to add this functionality. Supposedly, that would get rid of any troubles with innerHTML etc. in the Drip approach, since the DOM inspectors seem to be "live" - not using Javascript hooks, but still getting the same access as Drip to the IHTMLElement that contains both element properties AND the reference count.

So, it should be possible to modify such a DOM Inspector to keep track of any elements that are lost in the DOM tree, but haven't been freed.

All this babble, simply because as a programmer, it's nice to know how to recognize leaks, but it's nicer to have a program to help you when you slip ;) Hopefully we'll see one.

25 Posted by Erik Arvidsson on 23 October 2005 | Permalink

Laurens: Here is the modified code that shows that test5.html do indeed leak (I had it running for a few minutes):

window.onload = init;

function createLinks() {
var x = document.body;
var y = document.createElement('a');
y.appendChild(document.createTextNode('A link - '));
y.href = '#';
for (var i = 0; i < 2000; i++) {
x.appendChild(y.cloneNode(true));
}
}

function init() {
var x = document.getElementsByTagName('a');
for (var i = 0; i < x.length; i++) {
x[i].onclick = function () {
this.firstChild.nodeValue = ' Clicked! - ';
}
}

window.setTimeout("document.location.reload()", 50);
};

createLinks();

Damn, I'm not so sure any more... Maybe it leaks for other reasons? There is a reference from JScript to the NodeList COM object. COM references the function object and that object references the closure...