To read an entire file efficiently you need to know the file size FIRST.
So whats the fastest way to programatically find the file size. If you read stackoverflow you’ll find a number of suggestions, but many use fseek to jump to the end of the file, find your location and use that as the file size…
TL;DR – This works, but is 30% slower than using stat or fstat
I just ran a bunch of tests comparing using seek
, lseek
, stat
, and fstat
also comparing using file streams and file descriptors to see what seems to be the fastest. For the test I create a 100M file.
TL;DR – using file descriptors, fstat and read was the fastest and using file streams and seek was the slowest. Go to the bottom to see the real slowest.
For the test I ran this on a small Linux box I have running a headless ArchLinux server. I ran the test: checking the file size, malloc
a buffer, read the entire file into the buffer, close the file, free the buffer.
I ran the test 3 times with 1000 cycles each time and using clock_gettime
to calculate the elapsed time.
Just simply comparing JUST the time it takes to get the file size using stat
or fstat
were at least 30% faster than using seek
or lseek
.
Comparing just the speed of using file streams vs file descriptors, they were pretty nearly the same – descriptors were about 1-3% faster.
In comparing getting the file size, opening the file, malloc
a buffer, read the entire 100M, close the file and free the buffer — using file descriptors and fstat were 6-8% faster than using seek
or lseek
. Probably because the bulk of the time is spent in the file read vs the getting the file size, which dilutes the overall performance benefit.
BTW – do not use fgetc
and read the file 1 character at a time. This is crazy inefficient and really really slow! Like 1700% slower!!!!