This question got rather a freezing reception at SO, so I decided to delete it t
ID: 654314 • Letter: T
Question
This question got rather a freezing reception at SO, so I decided to delete it there and try here instead. If you think it does not fit here either, please at least leave a comment on suggestion how to find an example I'm after...
Can you give an example, where using C99 VLAs offers a real advantage over something like current standard heap-using C++ RAII mechanisms?
The example I am after should:
Achieve an easily measurable (10% maybe) performance advantage over using heap.
Not have a good workaround, which would not need the whole array at all.
Actually benefit from using dynamic size, instead of fixed maximum size.
Be unlikely to cause stack overflow in normal use scenario.
Be strong enough to tempt a developer needing the performance to include a C99 source file in a C++ project.
Adding some clarification on the context: I mean VLA as meant by C99 and not included in standard C++: int array[n] where n is a variable. And I am after an example of use case where it trumps the alternatives offered by other standards (C90, C++11):
int array[MAXSIZE]; // C stack array with compile time constant size
int *array = calloc(n, sizeof int); // C heap array with manual free
int *array = new int[n]; // C++ heap array with manual delete
std::unique_ptr<int[]> array(new int[n]); // C++ heap array with RAII
std::vector<int> array(n); // STL container with preallocated size
Some ideas:
Functions taking varargs, which naturally limits item count to something reasonable, yet is without any useful API-level upper limit.
Recursive functions, where wasted stack is undesirable
Many small allocations and releases, where heap overhead would be bad.
Handling multi-dimensional arrays (like arbitrarily sized matrices), where performance is critical, and small functions are expected to get inlined a lot.
From comment: concurrent algorithm, where heap allocation has synchronization overhead.
Wikipedia has an an example which does not fulfill my criteria, because the practical difference to using heap seems irrelevant at least without context. It is also non-ideal, because without more context, it seems item count could very well cause stack overflow.
Note: I'm specifically after an example code, or suggestion of an algorithm which would benefit from this, for me to implement the example myself.
Explanation / Answer
I just hacked up a little program that generates a set of random numbers restarting at the same seed each time, to ensure that it's "fair" and "comparable". As it goes along, it figures out the min and max of these values. And when it has generated the set of numbers, it counts how many are above the average of min and max.
For VERY small arrays, it shows a clear benefit with VLA's over std::vector<>.
It is not a real problem, but we can easily imagine something where we'd be reading the values from a small file instead of using random numbers, and doing some other, more meaningful counting/min/max calculations with the same sort of code.
For VERY small values of the "number of random numbers" (x) in the relevant functions, the vla solution wins by a huge margin. As the size goes larger, the "win" gets smaller, and given sufficient size, the vector solution appears to be MORE efficient - didn't study that variant too much, as when we start having thousands of elements in a VLA, it's not really what they were meant to do...
And I'm sure someone will tell me that there's some way of writing all this code with a bunch of templates and get it to do this without running more than the RDTSC and cout bits at runtime... But I don't think that's really the point.
When running this particular variant, I get about 10% difference between the func1 (VLA) and func2 (std::vector).
count = 9884
func1 time in clocks per iteration 7048685
count = 9884
func2 time in clocks per iteration 7661067
count = 9884
func3 time in clocks per iteration 8971878
This is compiled with: g++ -O3 -Wall -Wextra -std=gnu++0x -o vla vla.cpp
Here's the code:
#include <iostream>
#include <vector>
#include <cstdint>
#include <cstdlib>
using namespace std;
const int SIZE = 1000000;
uint64_t g_val[SIZE];
static __inline__ unsigned long long rdtsc(void)
{
unsigned hi, lo;
__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
int func1(int x)
{
int v[x];
int vmax = 0;
int vmin = x;
for(int i = 0; i < x; i++)
{
v[i] = rand() % x;
if (v[i] > vmax)
vmax = v[i];
if (v[i] < vmin)
vmin = v[i];
}
int avg = (vmax + vmin) / 2;
int count = 0;
for(int i = 0; i < x; i++)
{
if (v[i] > avg)
{
count++;
}
}
return count;
}
int func2(int x)
{
vector<int> v;
v.resize(x);
int vmax = 0;
int vmin = x;
for(int i = 0; i < x; i++)
{
v[i] = rand() % x;
if (v[i] > vmax)
vmax = v[i];
if (v[i] < vmin)
vmin = v[i];
}
int avg = (vmax + vmin) / 2;
int count = 0;
for(int i = 0; i < x; i++)
{
if (v[i] > avg)
{
count++;
}
}
return count;
}
int func3(int x)
{
vector<int> v;
int vmax = 0;
int vmin = x;
for(int i = 0; i < x; i++)
{
v.push_back(rand() % x);
if (v[i] > vmax)
vmax = v[i];
if (v[i] < vmin)
vmin = v[i];
}
int avg = (vmax + vmin) / 2;
int count = 0;
for(int i = 0; i < x; i++)
{
if (v[i] > avg)
{
count++;
}
}
return count;
}
void runbench(int (*f)(int), const char *name)
{
srand(41711211);
uint64_t long t = rdtsc();
int count = 0;
for(int i = 20; i < 200; i++)
{
count += f(i);
}
t = rdtsc() - t;
cout << "count = " << count << endl;
cout << name << " time in clocks per iteration " << dec << t << endl;
}
struct function
{
int (*func)(int);
const char *name;
};
#define FUNC(f) { f, #f }
function funcs[] =
{
FUNC(func1),
FUNC(func2),
FUNC(func3),
};
int main()
{
for(size_t i = 0; i < sizeof(funcs)/sizeof(funcs[0]); i++)
{
runbench(funcs[i].func, funcs[i].name);
}
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.