Skip to content

codebit-hub/Get_next_line

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project has been created as part of the 42 curriculum by dporhomo.

Score Language Completion Date

Project Status: ✅ Completed with Bonus

Get Next Line

Description

get_next_line project involves writing a function capable of reading a text file (or any other file descriptor) one line at a time.

The primary goal of this project is to learn about static variables in C, which allow data to persist between function calls. This project also reinforces concepts of memory management (allocation and freeing), buffer manipulation, and the lifecycle of a file descriptor.

The result is a versatile function:

char *get_next_line(int fd);

which can be reused in future C projects to parse text files efficiently.


Main criteria

  • Repeated calls to get_next_line function read the text pointed to by the file descriptor, one line at a time.
  • The function returns the line that was read. Each line is ended with '\n', unless the end of the file (EOF) is reached or the file does not end with '\n'.
  • The function returns NULL if the end of the file is reached or if an error occurs.
  • The function can read from a file or from the standard input.
  • Only 1 static variable is used.
  • The function is able to read from multiple file descriptors at the same time.

Algorithm Explanation & Justification

The core challenge of this project is that the read() function reads a fixed number of bytes (BUFFER_SIZE), which rarely aligns perfectly with the end of a line (\n). To solve this, the Stash Algorithm is used.

Logic Overview

  1. Read & Accumulate Data is read from the file descriptor in chunks of BUFFER_SIZE and appended to a static string (the stash) using a custom ft_strjoin, until a newline (\n) is found or EOF is reached.

  2. Extract Line Once a newline is detected, characters from the beginning of the stash up to and including the newline are extracted and returned.

  3. Clean Stash Any remaining characters after the newline are saved back into the stash. The previous stash is freed to prevent memory leaks.

Justification

  • Static Variables A static variable is required because the function returns a single line while potentially reading part of the next line. This leftover data must persist between calls.

  • Robustness The algorithm works for any BUFFER_SIZE. Whether the buffer size is 1 byte or very large, the cycle accumulate → extract → store ensures correctness.

  • Bonus Support To enable simultaneous reading of multiple files, at first, I replaced a single static pointer with an array of pointers:

    static char *stash[1024];

    This way the function could handle multiple file descriptors simultaneously without mixing up returned lines. The only limitation is creating a needlessly large string of char pointers 'static char *stash[1024]', enabling reading of only 1024 files. A larger number of files will crash the operation of the function.

    To remedy this, I decided to replace the static char *stash[1024] with a static struct t_list, which allowed me to use linked nodes, containing static char pointers obtaining data from designated files linked via file descriptors. This strategy is also more efficient memory wise, because nodes are being created only as necessary based on the file usage. There is no need for allocating a large block of memory for a large and limited number of char pointers, as it was with the static char *stash[1024] approach.

typedef struct s_list
{
   int				fd;
   char			*stash;
   struct s_list	*next;
}	t_list;

Limitations

There are hard limits set in by the predefined scope of the used variable types.

  1. 'int' variables can hold a maximum value of 2,147,483,647. If the *stash (a single line or accumulated reads) grows larger than ~2.14 GB, the index variable 'int i' will overflow (wrap around to negative numbers). This will likely cause a Segmentation Fault or an infinite loop when trying to access stash[i]. Fix: index variables were changed from int type to size_t type, which is designed to match the maximum possible memory size of the system (unsigned 64-bit). The file descriptor variable was left as 'int fd', which limits the possible number of simultaneously opened files to 2,147,483,647. Is is acceptable limitation provided by the predefined declaration of the get_next_line function. (Practically, the operating System enforces a much lower limit on open files (usually 1024 to 4096 soft limit), which is long before the stated int overflows.)

    Differences between variable types, such as int, size_t and ssize_t:

    	int	(+/-)		32 bits (4 bytes)	~2 Billion		Basic counters, math, return codes.
    	size_t	(+)		64 bits (8 bytes)	~18 Quintillion	Memory sizes, array indices, sizeof, strlen.
    	ssize_t	(+/-)	64 bits (8 bytes)	~9 Quintillion	System call returns (read, write) that need to return size OR error (-1).
    
  2. Physical memory limit: Available RAM Since 'stash' is allocated using 'malloc', it resides on the Heap, which is limited by available RAM. If 'malloc' attempts to read a file larger than the available RAM, it will fail and the function get_next_line will return NULL, as expected.

  3. The Performance Limit (Time) There is a 'soft limit' in operation of 'ft_strjoin' function. Every time a new chunk of data (BUFFER_SIZE) is read from a file, 'ft_strjoin' allocates a new string and copies the entire stash into it. If the file contains extremely long lines (e.g. 10 MB or larger) and the BUFFER_SIZE is set to only 10 bytes, those 10 MB of data will be read in 10 bytes increments 1,000,000 times. This is O(N²) complexity. While not crashing, the program will become exponentially slower as the line gets longer. For a file with a single massive line (e.g., hundreds of MBs), the program might appear to freeze. However, it is acceptable under the scope of this project.

  4. The BUFFER_SIZE Limit To ensure that the function operates even when the BUFFER_SIZE has not been specifically set by the user upon launching of the program, the default BUFFER_SIZE in the header file is set to 1024 bytes.

    The maximum value that can be assigned to the BUFFER_SIZE is dictated by the read() function prototype 'ssize_t read(int fd, void *buf, size_t count);'. It takes an unsigned size (size_t count) as input, but it returns a signed size (ssize_t) so it can return -1 for errors. It means that if the read() function attempts to read more bytes than fit in a signed long long integer (i.e., greater than LLONG_MAX or 9,223,372,036,854,775,807 = 2^63), the return value could overflow or be undefined. By capping BUFFER_SIZE at this limit, it is ensured that read() function behaves predictably.

    By forcing BUFFER_SIZE to 0 in the header if BUFFER_SIZE is set to a value greater than LLONG_MAX (9,223,372,036,854,775,806 = 2^63 - 1), this safety check is triggered immediately, causing get_next_line function to safely return NULL instead of attempting a massive malloc or crashing the system.

    The safety guard is defined in the header file, as follows:

# if BUFFER_SIZE > 9223372036854775806
#  undef BUFFER_SIZE
#  define BUFFER_SIZE 0
# endif
  1. Binary Files The current solution relies on ft_strlen and ft_strchr functions. If a file contains a null byte (\0) in the middle of a line, the code will stop reading/copying there. This is a standard and expected behavior for this project (as it's meant for text), but technically the current solution doesn't support "binary" reading fully. Meeting this criterion lies beyond the scope of this project.

Instructions

Requirements

  • GCC or Clang compiler

  • Standard C libraries:

    • unistd.h
    • stdlib.h
    • fcntl.h

Compilation

Compile the project by including all source files and defining BUFFER_SIZE:

for the mandatory part:

cc -Wall -Wextra -Werror -D BUFFER_SIZE=42 main.c GNL/get_next_line.c GNL/get_next_line_utils.c  -o gnl

or for the bonus part:

cc -Wall -Wextra -Werror -D BUFFER_SIZE=42 main_bonus.c GNL/get_next_line_bonus.c GNL/get_next_line_utils_bonus.c -o gnl

Note: main.c is not part of the project submission. It is only required for local testing.

Execution

Run the compiled executable:

./gnl

Usage Example

Example main.c below is demonstrating how to call get_next_line. However, the provided main.c or main_bonus.c can be safely used for this purpose.

#include "get_next_line.h"
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
	int		fd;
	char	*line;

	fd = open("text.txt", O_RDONLY);
	if (fd < 0)
		return (1);

	while ((line = get_next_line(fd)) != NULL)
	{
		printf("%s", line);
		free(line);
	}
	close(fd);
	return (0);
}

Resources

References


AI Usage

AI assistance was used for:

  • Conceptual Understanding Clarifying the difference between stack, heap, and static memory allocation, and why int * arrays are unsuitable for string manipulation compared to char * arrays.

  • Validation Interpreting Valgrind logs to confirm that "still reachable" memory blocks are expected behavior for static variables and not memory leaks.

About

Coding a function capable of reading a text file one line at a time (application of a static variable)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors