To read an entire file efficiently you need to know the file size FIRST.
So whats the fastest way to programatically find the file size. If you read stackoverflow or ask ChatGPT (which appears to use answers from stackoverflow) you’ll find a number of suggestions, but many use fseek to jump to the end of the file, find your location and use that as the file size…
TL;DR – This works, but is 30% slower than using stat or fstat
I just ran a bunch of tests comparing using seek
, lseek
, stat
, and fstat
and also comparing the use of file streams versus file descriptors to see what seems to be the fastest.
TL;DR – using file descriptors, fstat and read was the fastest and using file streams and seek was the slowest. Jump to the bottom to see the real slowest.
For the test I ran this on a small Linux box I have running a headless ArchLinux server. I ran the test: checking the file size, malloc
a buffer, read the entire file into the buffer, close the file, free the buffer. For the test I create a 100M file.
I ran the test 3 times with 1000 cycles each time and using clock_gettime
to calculate the elapsed time.
Simply comparing JUST the time it takes to get the file size using stat
or fstat,
they were at least 30% faster than using seek
or lseek
.
Comparing just the speed of using file streams vs file descriptors, they were pretty nearly the same – descriptors were about 1-3% faster.
In comparing getting the file size, opening the file, malloc
a buffer, read the entire 100M, close the file and free the buffer — using file descriptors and fstat were 6-8% faster than using seek
or lseek
. Probably because the bulk of the time is spent in the file read vs the getting the file size, which dilutes the overall performance benefit.
BTW – do not use fgetc
and read the file 1 character at a time. This is crazy inefficient and really really slow! Like 1700% slower!!!!
As for my warning about ChatGPT, unfortunately it is self-reinforcing. More and more programmers are going to ChatGPT and other services to look for answers (some for questions posted on stackoverflow) and then unquestioningly post the output from ChatGPT. Since ChatGPT sourced it’s information from sites like stackoverflow and now Chat’s answer is posted back to sites like these, the answer is further reinforced (even if it is wrong).
So PLEASE, check before blindly pasting code from AI sites.