How Can I Read a Large File Line by Line in PHP?
Every now and then, you will have to read and process a large file in PHP. If you are not careful when reading a large file, you will often end up exhausting all the memory of your server. However, there is no need to worry because PHP already provides a lot of inbuilt functions to read large files (by large I mean files with size over 1GB) either line by line or one chunk at a time. In this article, you will learn about the pros and cons of all these techniques.
Reading a File Line by Line into an Array Using file()
You can use the file() function in PHP to read an entire file into an array. This function stores each line along with the newline character in its own array element. If you intended to read a file line by line and store the result in an array, the file()
function seems to a natural choice.
One major drawback of this method is that it will read the whole file at once. In other words, it solves the problem of reading a file one line at a time but it still reads the whole file at once. This means that you won’t be able to use it to read very large files.
PHP
$life_of_bee_lines = file('life-of-the-bee.txt');
$short_lines = 0;
// Output — Total Lines: 6305.
echo "Total Lines: ".count($life_of_bee_lines).".\n";
foreach($life_of_bee_lines as $line) {
if(strlen($line) <= 30) {
$short_lines++;
}
}
// Output — Total number of short lines: 778.
echo "Total number of short lines: $short_lines.\n";
In the example above, I have read a text file called The Life of the Bee from Project Gutenberg. This file is just 360kb long so reading it using the file()
function was not a problem. Now we will learn how to read a file that might be over 1GB in size without exhausting our memory.
Reading a Large File One Line at a Time Using fgets()
The fgets() function in PHP reads the current line from an open file pointed to by the resource handle. The function stops reading after reaching a specified length, encountering a new line or reaching the end of file. It also returns FALSE if there is no more data to be read. This means that we can check if a file has been read completely by comparing the return value of fgets()
to false
.
If you have not specified a particular length that this function should read, it will read the whole line. In other words, the maximum memory that you need with fgets()
depends on the longest line in the file. Unless the file that you are reading has very long lines, you won’t exhaust your memory like you could with the file()
function in previous section.
PHP
$handle = fopen("life-of-the-bee.txt", "r");
$total_lines = 0;
$short_lines = 0;
if ($handle) {
$line = fgets($handle);
while ($line !== false) {
$total_lines++;
if(strlen($line) <= 30) {
$short_lines++;
}
$line = fgets($handle);
}
fclose($handle);
}
// Output — Total Lines: 6305.
echo "Total Lines: $total_lines.\n";
// Output — Total number of short lines: 778.
echo "Total number of short lines: $short_lines.\n";
Inside the if
block, we read the first line of our file and if it is not strictly equal to false
we enter the while
loop and process the whole file one line at a time. We could also combine the file reading and checking process in one line like I have done the following example.
PHP
$handle = fopen("life-of-the-bee.txt", "r");
$total_lines = 0;
$short_lines = 0;
if ($handle) {
while (($line = fgets($handle)) !== false) {
$total_lines++;
if(strlen($line) <= 30) {
$short_lines++;
}
}
fclose($handle);
}
// Output — Total Lines: 6305.
echo "Total Lines: $total_lines.\n";
// Output — Total number of short lines: 778.
echo "Total number of short lines: $short_lines.\n";
I would like to point out the just like file()
, the fgets()
function reads the new line character as well. You can verify this by using var_dump()
.
Reading a Large File in Smaller Pieces Using fread()
One drawback of reading files using fgets()
is that you will need substantial amount of memory to read files which have very long lines. The situation is very unlikely to happen but if you can’t make any assumptions about the file it is better to read it in small sized chunks.
This is where the fread() function in PHP comes to our rescue. This function reads a file up to length
number of bytes. Youare the one who gets to specify the length
so you don’t have to worry about running out of memory.
PHP
$handle = fopen("large-story.txt", "r");
$chunk_size = 1024*1024;
$iterations = 0;
if ($handle) {
while (!feof($handle)) {
$iterations++;
$chunk = fread($handle, $chunk_size);
}
fclose($handle);
}
// Output — Read the whole file in 13 iterations.
echo "Read the whole file in $iterations iterations.";
In the above example, we read the large text file (1024*1024) 1MB at a time and it took 13 iterations to read the whole file. You can set the value of chunk_size
to a reasonable value and read a very large file without exhausting your memory. You could use this technique to read a 1GB file using just 1MB of memory in about 1024 iterations. You can also read even larger files and increase the chunk_size
if you want to keep the number of iterations low.
Quick Summary
Let’s recap everything that we have covered in this tutorial.
If you want to read files line by line into an array, using the
file()
function would be the right choice. The only drawback if that this function will read whole file at once so you won’t be able to read very large files without exhausting the memory.If the files you are reading are very large and you have no plans to store the individual lines in an array, using the
fgets()
function would be the right choice. The amount of memory that your script needs will only depend on the length of largest line in the file that you are reading. This means that the file could be over 1GB in size but if the individual lines are not very large, you will never exhaust your memory.If you are reading large files and have no idea how long individual lines could be in that particular file, using
fread()
would be your best bet. This function lets you specify the size of chunks in which you want to read the file. Another advantage offread()
overfgets()
is that you will not have to perform a lot of iterations if the file you are reading contains large number of empty lines.
Let me know if there is anything that you would like me to clarify. Also, you are more than welcome to comment if you know other techniques for reading large files one line at a time in PHP.
Rate this post —
awesome, however how to fseek in that chunk?
as I need to start from line 1000 of that file before chunk?
Is there a way to do this in a non-blocking function. As soon as you execute while, your system will do nothing else until that completes and that’s just bad form