Wednesday, January 7, 2009

Pro's and Con's of returing references

I've recently began working on a moderate size C++ application. The previous developer loved the benefits of returning const&.

For those who don't know returning const& can avoid an extraneous copy. Example:


class Foo {
Foo(std::string s) { m_name = s; }
std::string m_name;
public:
std::string const& name() { return m_name; }
}


The above code avoids the copy from m_name to the function return value that would have resulted with the following client code:


Foo f("bar");
std::string name = f.name();



After working in this code base for a while now I believe that returning references are evil and should be treated just like returning a pointer, which is avoid it.

For example the problem that arose that took a week to debug was the following:


class Foo {
std::vector< Bar > m_vec;
public:
void insert(Bar& b) { m_vec.push_back(b); }
Bar const& getById(int id) { return m_vec[id]; }
}


The problem in this example is clients are calling and getting references that are stored in the vector. Now what happens after clients insert a bunch of new elements? The vector needs to resize internally and guess what happens to all those references? That's right there invalid. This caused a very hard to find bug that was simply fixed by removing the &.

The conclusion I've arrived at is that returning a & is a pre-mature optimization. If your profiling shows you your code is very slow at this point than maybe consider it. But you should also be aware that most modern compilers optimize away this extra copy anyways.

Here is a contrived example that shows problems you can get into by blindly returning references. My advice avoid them until your profiler tells you that you have a problem.

#include
#include
#include

using namespace std;

class foo {
std::string m_string;
public:
std::string const getName(){
static int count = 0;
count++;
std::stringstream ss;
ss << count ;
m_string = ss.str();
return m_string;
}
};
foo f;
void fun(std::string const&s) {
f.getName();
std::cout << s;
}
int main() {
std::cout << " print version " << std::endl;
fun(f.getName());
}


Hope that Helps

4 comments:

r.o.e said...

why not mark the accessor as const if you intended it to be read-only, like this
Bar const& getById(int id) const
{ return m_vec[id]; }

developer-resource said...

Marking the method as const doesn't solve the problem. Although strictly speaking it should be const.

Here is an example program. I encourage you to try it out as it illustrates the problem. This example is rather contrived but in a my system this happened because the 'Foo' object was a registry that was called from many places in our code and causes many inserts to happen thus invalidating the back end vector.


#include "stdafx.h"
#include <iostream>
#include <vector>

class Bar {
int m_id;
public:
Bar() {
static int id = 0;
m_id = id++;
}
int getId() const {
return m_id;
}
};

class Foo {
std::vector< Bar > m_vec;
public:
void insert(Bar& b) { m_vec.push_back(b); }
Bar const& getById(int id) { return m_vec[id]; }
};


Foo f;
// This method fails. b.getId() will not be equal to desired unless there is some fluke in your memory.
// It may blow up but more likely will print garbage.
// Change to void test(Bar b , int desired) { for a working example.
void test(Bar const& b , int desired) {
for(int i=0; i<100000; i++)
f.insert(Bar());
std::cout << b.getId() << " == " << desired << std::endl;
}
int main(int argc, char* argv[])
{
f.insert(Bar());
test(f.getById(0), 0 );
}

tomgee said...

Sorry, I was reading too fast and misread your post.

I tried your code and understood your points. The issue to me is in fact you were trying to return a reference to a temporary object(m_vec is not as long f is alive, but the Bar objects inside m_vec are), which is always not recommended.

However, there's a trick to get your code works with minor change by
-1) reserving m_vec in the ctor of Foo:

Foo()
{
m_vec.reserve(MAX_SIZE);
}

2) adding check to insert:
void Foo::insert(const Bar& b)
{
if(m_vec.size < MAX_SIZE)
m_vec.push_back(b);
}

I've tested with GCC and it works.

BTW, in your code, void insert(Bar& b) needs to be void insert(const Bar& b), or it does not compile.

Regards,
Tom

Taras said...

The issue here is not that returning by reference is evil, just that what you are returning a reference to can become invalid.

Page 153, section 6.2 of "C++ Standard Library: A Tutorial and Reference" - Josuttis, reads:

"Inserting or removing elmeents invalidates references, pointers, and iterators that refer to the following elements. If an insertion causes reallocation, it invalidates all references, iterators,
and pointers"

Your code sample is as evil as holding a reference to the first element of the vector, inserting 1000 elements into the vector, and then trying to use the existing reference.