AST (Abstract Syntax Tree)

Abstract Syntax Tree.

Author

jguillem pulgamecanica

Defines

REDIR(node)

Convenience accessor: get t_redir * from a t_list node.

Enums

enum t_node_type

# Example
$> (which ls) | echo "dude" && which top
https://raw.githubusercontent.com/leowz/42sh/main/docs/assets/ast_1.png

Warning

Important to don’t change the order

Values:

enumerator NODE_COMMAND
enumerator NODE_PIPE
enumerator NODE_AND
enumerator NODE_OR
enumerator NODE_SEQUENCE
enumerator NODE_SUBSHELL
enumerator NODE_BLOCK
enumerator NODE_BACKGROUND

Functions

char *ast_to_string(t_ast *ast)
char *cmd_to_string(t_cmd *cmd)
t_ast *ast_new_command(t_cmd *cmd)
t_ast *ast_new_binary(t_node_type type, t_ast *left, t_ast *right)
t_ast *ast_new_group(t_node_type type, t_ast *child, t_list *redirs)
void ast_free(t_ast *node)
t_redir *redir_new(t_token_type type, int fd, char *target)
void redir_free(t_redir *redir)
struct t_redir
#include <ast.h>

Redirection data node.

Stored in t_cmd.redirs and t_group.redirs as t_list* (each node->content is a t_redir *).

Public Members

t_token_type type

Operator kind (TOK_REDIR_IN/OUT/APPEND/HEREDOC/…).

int fd

Source fd (-1 = default for the operator type).

char *target

Raw filename or fd-number string (unexpanded).

char *heredoc_delim

Delimiter word for << (raw, may be quoted).

int heredoc_fd

Read end of pipe holding collected heredoc body.

int heredoc_quoted

1 if delimiter was quoted (suppresses expansion).

struct t_cmd
#include <ast.h>

Simple command data.

Stored in t_ast.data as a union with further access t_ast->data.cmd

Public Members

char **argv

NULL-terminated array of raw/unexpanded argument strings.

int argc

Number of elements in argv (excluding NULL terminator).

t_list *assignments

t_list* of char* “NAME=value” (unexpanded).

t_list *redirs

t_list* of t_redir* in source order (left-to-right).

struct t_binary
#include <ast.h>

Binary operation data (pipe, &&, ||, ;).

Public Members

struct s_ast *left

Left-hand operand subtree.

struct s_ast *right

Right-hand operand subtree.

struct t_group
#include <ast.h>

Group data (subshell, block, background).

Public Members

struct s_ast *child

Subtree wrapped by the group.

t_list *redirs

t_list* of t_redir* applied to the whole group, e.g. (cmd) > file or { cmd; } 2>&1.

struct t_ast
#include <ast.h>

AST node - union-based for zero-overhead dispatch.

Public Members

t_node_type type

Discriminator selecting which arm of data is live.

t_cmd *cmd

data.cmd Live when type == NODE_COMMAND.

t_binary *binary

data.binary Live when type == NODE_PIPE/AND/OR/SEQUENCE.

t_group *group

data.group Live when type == NODE_SUBSHELL/BLOCK/BACKGROUND.

union t_ast data

Per-node payload selected by type.

How is the t_ast generated?

The abstract syntax tree is generated from the tokenized line. note: The shell must be provided in order to handle the heredoc, no other reason.

t_ast *parser_parse(t_list *tokens, t_shell *shell)

Ast to string

Stringify an AST subtree for human-readable job listings (t_jobs).

Author

pulgamecanica

Functions

static char *join_free(char *a, const char *b)
static char *append_redirs(char *out, t_list *redirs)
char *cmd_to_string(t_cmd *cmd)
char *ast_to_string(t_ast *ast)

Parser

Recursive-descent parser: transforms the token stream into an AST.

Author

jguillem pulgamecanica

Functions

t_ast *parser_parse(t_list *tokens, t_shell *shell)

consume a t_list* of tokens and return the AST root.

Returns NULL on syntax error (error printed to stderr).

The token list is NOT freed here, caller frees it with lexer_free_tokens().

this is the main function of the parser module

call parse list which wrap all the layers and check for EOF

parse_list (“;” and “&” separators) -> parse_and_or

parse_and_or (”&&” and “||” separators) -> parse_pipeline

parse_pipeline (“|” separator) -> parse_command

parse_command (read simple command) -> parse_subshell

parse_subshell recurse on parse_list

parse_heredoc walk ast and collect heredocs

int parser_collect_heredocs(t_ast *ast, t_shell *shell)

walk the AST after parsing and read heredoc content

from stdin for each << redirection.

shell is needed to read from the correct fd and to check SIGINT.

Returns 0 on success, -1 if SIGINT aborted heredoc input.

void heredoc_expand_config(t_redir *redir)
int parser_accept(t_parser *p, t_token_type type)
int is_redir(t_token_type type)
t_token *parser_peek(t_parser *p)
t_token *parser_next(t_parser *p)
t_ast *parse_subshell(t_parser *p)
t_ast *parse_block(t_parser *p)
t_ast *parse_command(t_parser *p)
t_ast *parse_simple_command(t_parser *p)
t_ast *parse_pipeline(t_parser *p)
t_ast *parse_and_or(t_parser *p)
t_ast *parse_list(t_parser *p)
void cmd_free(t_cmd *cmd)

Free a partially-built t_cmd on a parse error.

Releases argv (and every strdup’d entry), assignments list, redirs list (via redir_free), and the struct itself.

void redir_free(t_redir *redir)
struct t_parser
#include <parser.h>

Parser internal state.

Public Members

t_list *tokens

t_list* of t_token* produced by the lexer.

t_list *current

List node the parser is currently examining.

char *error

Human-readable error string (set on syntax error, freed by parser).

Parser Entry Point

main file of parser module

file for parser utils functions

Author

jguillem, pulgamecanica

Author

jguillem

Functions

t_ast *parser_parse(t_list *tokens, t_shell *shell)

consume a t_list* of tokens and return the AST root.

this is the main function of the parser module

call parse list which wrap all the layers and check for EOF

parse_list (“;” and “&” separators) -> parse_and_or

parse_and_or (”&&” and “||” separators) -> parse_pipeline

parse_pipeline (“|” separator) -> parse_command

parse_command (read simple command) -> parse_subshell

parse_subshell recurse on parse_list

parse_heredoc walk ast and collect heredocs

Simple Commands

file for parse command

Author

jguillem

Functions

t_ast *ast_new_command(t_cmd *cmd)
t_ast *parse_command(t_parser *p)
static t_cmd *command_init(void)

initialize t_cmd struct

static void parser_error_unexpected(t_parser *p, t_token *token)

handle t_parser error field

Parameters:
static char **command_size(t_parser *p, t_cmd *command)

allocate memory of t_cmd struct argv field

Parameters:
Returns:

char **argv

static char *collect_arith_token(t_parser *p)

helper to reconstruct “$((a + b)” from token into one string

Parameters:
Returns:

char *string reconstructed

static t_ast *command_build(t_parser *p, t_cmd *command)

fill the t_cmd struct

Parameters:
Returns:

t_ast struct

static int is_assignment(char *str)

Check whether str is a POSIX assignment word.

The portion BEFORE the first ‘=’ must be a valid identifier (alpha or ‘_’ first, then alnum or ‘_’). The portion AFTER the ‘=’ is the value and may contain any characters, including quotes - quote handling is the expander’s job.

Parameters:
  • str – char *

Returns:

1 if str is a NAME=VALUE assignment, 0 otherwise.

static void parse_assignment(t_parser *p, t_cmd *command)

append assignments to t_cmd struct

Parameters:
t_ast *parse_simple_command(t_parser *p)

Pipes

file for pipeline parser

Author

jguillem

Functions

t_ast *parse_pipeline(t_parser *p)

Logical Operators

file for && and || parser

Author

jguillem

Functions

static int detect_and_or(t_parser *p, t_node_type *type)

helper function to detect TOK_AND and TOK_OR

Parameters:
  • p – struct s_parser

  • type – enum e_node_type

Returns:

0 (no one), 1 (TOK_AND), 2 (TOK_OR)

t_ast *parse_and_or(t_parser *p)

Sequences and Background

file for parse list

Author

jguillem

Functions

static int detect_separator(t_parser *p, t_node_type *operator)

helper function for parse_list. Newlines act as command terminators (POSIX 2.10) so ; and a literal \n between commands are equivalent. This matters for multi-line () / {} groups and for any input that came through shell_read_logical_line, which joins physical lines with \n to preserve quote interiors and paren grouping.

Parameters:
  • p – struc s_parser pointer

  • operator – enum e_node_type pointer

Returns:

0 (no separator), 1 (; or

separator), 2 (& separator)

t_ast *parse_list(t_parser *p)

Subshells and Blocks

file for parse group

Author

jguillem

Functions

t_ast *ast_new_group(t_node_type type, t_ast *child, t_list *redirs)
static void parse_group_cleanup(t_ast *child, t_list *redirs)

Free the child AST and any partially-built redirs list.

Called from every error path in parse_group, parse_subshell and parse_block so a half-built subshell/block cannot leak its inner AST plus accumulated redirections on a syntax error (e.g. random-binary input that parses partway then dies).

static t_ast *parse_group(t_parser *p, t_ast *child, t_node_type node)

factorization of the end of parse_subshell and parse_block

Parameters:
  • p – struct s_parser pointer

  • child – struct s_ast pointer

  • node – enum e_node_type

Returns:

pointer on struct s_ast

t_ast *parse_subshell(t_parser *p)
t_ast *parse_block(t_parser *p)

Here-documents

file to handle heredoc

Author

jguillem

Functions

static char *strip_tab(char *origin)

strip the leading tabulations of origin

Parameters:
  • origin – : raw string

Returns:

an allocated cleaned string

static int write_line(int fd, char *line, size_t len)
static int pop_queued_heredoc(t_shell *shell, int fd)

Pop the next pre-collected heredoc body from shell->heredoc_body_queue and write it to fd.

The REPL’s shell_read_logical_line scans command lines for <<DELIM operators and pushes one body per heredoc onto the queue, in declaration order. The parser’s AST walk visits heredocs in the same order, so a plain FIFO pop is correct. Bodies are already tab-stripped per <<- rules and contain newline-terminated lines (or “” for empty bodies).

Returns:

0 on success, -1 on write failure.

static int read_heredoc(t_redir *redir, t_shell *shell, int fd)

Read or recover the heredoc body. If the REPL already pre-collected the body into shell->heredoc_body_queue (the case for any input that came through shell_read_logical_line, i.e. both interactive and piped stdin), we pop it. Otherwise we fall back to reading line-by-line via shell_read_line (used when parser_parse is called outside the REPL, e.g. unit tests or -c mode &#8212; though -c mode doesn’t yet handle heredocs since the lexer doesn’t recognise body lines inline).

Parameters:
  • redir – : a pointer on struct s_redir

  • shell – : shell state, used to reach the shared input reader

  • fd – : the backing-store fd the heredoc body is written to

Returns:

0 (success) | -1 (failure or SIGINT)

static int collect_heredoc(t_redir *redir, t_shell *shell)

collect one heredoc body into an unlinked temp file and store the readable fd (rewound to offset 0) in redir->heredoc_fd. There is no fork: the collector runs in the shell process and therefore shares the one stdin cursor used to read command lines.

Parameters:
  • redir – : pointer on a struct s_redir

  • shell – : shell state

Returns:

0 | -1

static int collect_heredocs_from_list(t_list *lst, t_shell *shell)

helper function for collect heredocs from command or subshell

store redirections

Parameters:
  • lst – : a struct s_list pointer

  • shell – : shell state

static int collect_heredocs_from_command(t_cmd *cmd, t_shell *shell)

collect redirections of commands

Parameters:
  • cmd – : struct s_cmd pointer

  • shell – : shell state

static int collect_heredocs_from_group(t_group *group, t_shell *shell)

collect redirections of group

Parameters:
  • group – : struct s_group pointer

  • shell – : shell state

void heredoc_expand_config(t_redir *redir)
static int ast_walk(t_ast *ast, t_shell *shell)

traverses the ast tree and collect heredocs content, stopping on SIGINT

Parameters:
  • ast – : a pointer on a struct s_ast

  • shell – : the shell struct

Returns:

0 on success, -1 on SIGINT

static void heredoc_sigint_handler(int signal)
int parser_collect_heredocs(t_ast *ast, t_shell *shell)

walk the AST after parsing and read heredoc content

from stdin for each << redirection.

shell is needed to read from the correct fd and to check SIGINT.

Returns 0 on success, -1 if SIGINT aborted heredoc input.

Variables

volatile sig_atomic_t g_sigint_heredoc = 0

Utilities

Functions

t_ast *ast_new_binary(t_node_type type, t_ast *left, t_ast *right)
int is_redir(t_token_type type)
t_token *parser_peek(t_parser *p)
t_token *parser_next(t_parser *p)
int parser_accept(t_parser *p, t_token_type type)

AST Memory Management

file for free memory functions

Author

jguillem

Functions

void redir_free(t_redir *redir)
static void free_argv(char **argv)
void cmd_free(t_cmd *cmd)

Free a partially-built t_cmd on a parse error.

Releases argv (and every strdup’d entry), assignments list, redirs list (via redir_free), and the struct itself.

void ast_free(t_ast *node)