Expander

Word expansion: quotes, parameter, tilde, field splitting.

The expander is invoked by the executor right before a simple command runs. It performs the POSIX expansion pipeline on the raw token strings produced by the lexer/parser:

Author

pulgamecanica

  1. Tilde expansion ~ ~user

  2. Parameter expansion $VAR ${VAR} $? $$ $0

  3. Quote handling ‘ “ \ semantics

  4. Field splitting on $IFS (only on unquoted expansion output)

  5. Quote removal done in-line during the char-by-char loop

Command substitution $(), arithmetic $(()) and pathname (glob) expansion are listed as modular features in plan/04_expander.md and are not implemented in this module yet.

Functions

char *expand_word(struct s_shell *shell, const char *word)

Expand a word to a single string (no field splitting, no globbing).

Used for assignment values and redirection targets.

Parameters:
  • shell – The shell instance.

  • word – The raw token string from the parser.

Returns:

Newly-allocated string. Caller frees. NULL on allocation failure or when word is NULL.

char **expand_word_to_fields(struct s_shell *shell, const char *word)

Expand a word to multiple fields (argv words).

Performs all expansions plus field splitting on $IFS.

Parameters:
  • shell – The shell instance.

  • word – The raw token string from the parser.

Returns:

Newly-allocated NULL-terminated array of strings. Caller frees each element and the array itself. May return a zero-element array (just a NULL terminator) when the word expanded to nothing splittable. NULL on allocation failure or when word is NULL.

int expand_command(struct s_shell *shell, t_cmd *cmd)

Expand a whole simple command in place: argv, assignments, redirs.

Replaces cmd->argv with a new array containing the expanded fields, updates cmd->argc, rewrites each “NAME=value” in cmd->assignments, and rewrites redir->target on every non-heredoc redirection.

Parameters:
  • shell – The shell instance.

  • cmd – The command node to expand.

Returns:

0 on success, -1 on allocation failure.

int xbuf_init(t_xbuf *buf)

Initialise an empty expansion buffer. Returns 0 / -1.

void xbuf_free(t_xbuf *buf)

Release a buffer’s storage. Safe to call on a NULL/empty buf.

int xbuf_putc(t_xbuf *buf, char c, char split)

Append one byte plus its split-mask flag.

Parameters:
  • buf – The buffer to append to.

  • c – The character to append.

  • split – 1 if c is subject to IFS splitting, 0 if literal.

Returns:

0 on success, -1 on allocation failure.

int xbuf_puts(t_xbuf *buf, const char *s, char split)

Append a NUL-terminated string with a uniform split flag.

Parameters:
  • buf – The buffer to append to.

  • s – The string to append (may be NULL → no-op success).

  • split – 1 if every byte of s is splittable, 0 if literal.

Returns:

0 on success, -1 on allocation failure.

int expand_word_into(struct s_shell *shell, const char *word, t_xbuf *out)

Run the char-by-char expansion loop on word into out.

Handles single quotes (no expansion), double quotes (parameter expansion only), tilde at word start, $-expansions, and backslash escapes per POSIX.

Returns:

0 on success, -1 on allocation failure.

int expand_dollar(struct s_shell *shell, const char *input, size_t *pos, int dq, t_xbuf *out)

Read a $… sequence beginning at input[*pos].

Supports $?, $$, $0, $NAME and ${NAME} (including ${?}, ${$}, ${0}). Unknown variables expand to the empty string. A bare $ followed by no recognised form is emitted literally.

Parameters:
  • shell – The shell instance (for variable lookup and $?).

  • input – The full word string.

  • pos – In/out: byte index, advanced past the consumed sequence.

  • dq – 1 when the $-sequence is inside double quotes (suppresses split).

  • out – Buffer to append the expanded value to.

Returns:

0 on success, -1 on allocation failure.

int expand_tilde_at(struct s_shell *shell, const char *input, size_t *pos, t_xbuf *out)

Read a tilde sequence beginning at input[*pos].

Recognises a leading “~” or “~user” and replaces it with $HOME or the named user’s home directory. If neither resolves, the original text is emitted literally.

Returns:

0 on success, -1 on allocation failure.

char **field_split(struct s_shell *shell, const t_xbuf *expanded)

Field-split an already-expanded buffer on $IFS.

Honours the buffer’s split mask: a character is a candidate delimiter only when both (a) it appears in $IFS and (b) its mask byte is 1. Empty $IFS disables splitting entirely.

Returns:

Newly-allocated NULL-terminated array of strings, or NULL on allocation failure.

int expand_arithmetic(struct s_shell *shell, const char *input, size_t *pos, int dq, t_xbuf *out)

expand arithmetic expression $((expr))

check the depth of double parenthesis and call arith_eval

Returns:

if OK 0 - else -1

int expand_cmdsub(struct s_shell *shell, const char *input, size_t *pos, int dq, t_xbuf *out)

Expand a $( ... ) command substitution.

Forks a child whose stdout is captured via a pipe; the inner script runs through the same lexer/parser/executor as a normal command line. Trailing newlines are stripped per POSIX 2.6.3. Unquoted (dq == 0) output is pushed with split=1 so field splitting still runs.

Returns:

0 on success, -1 on allocation / unmatched-paren errors.

int arith_eval(const char *expr, long long int *result)

Evaluate a plain arithmetic expression string to a long long int.

Implements a recursive descent parser for the following grammar: expr -> term ((‘+’ | ‘-’) term)* term -> factor ((‘*’ | ‘/’ | ‘’) factor)* factor -> ‘(’ expr ‘)’ | [‘-’] NUMBER Operator precedence and left-associativity are handled naturally by the call chain. Whitespace between tokens is ignored.

Returns:

0 on success or -1

long long int parse_expr(t_arith *a)

Parse and evaluate the top-level arithmetic expression.

Handles additive operators (‘+’ and ‘-’) with left-to-right associativity by repeatedly calling parse_term() for each operand. Stops as soon as a non-additive token is encountered or the end of the string is reached. Short-circuits immediately if a->error is set by a deeper call, leaving the position unchanged so arith_eval() can detect and report trailing garbage.

Returns:

The computed value of the expression parsed

struct t_arith
#include <expander.h>

Helper structure to handle arithmetic expression.

Public Members

const char *s
int i
int error
struct t_xbuf
#include <expander.h>

Expansion buffer with a parallel “splittable” mask.

Phase 1 of expansion writes into a t_xbuf instead of a plain C string so that field splitting can later distinguish IFS characters that came from an unquoted expansion (splittable) from IFS characters that were literal-quoted in the source word (not splittable).

For every byte stored in data there is a parallel byte in mask: 1 means “IFS-splittable”, 0 means “literal - never a field boundary”.

Note

Both buffers are NUL-terminated so the payload can be inspected with the regular string functions.

Public Members

char *data

NUL-terminated expanded text.

char *mask

Parallel mask: 1 = splittable, 0 = literal.

size_t len

Number of bytes currently stored (excluding NUL).

size_t cap

Allocated capacity of data and mask.

Expander

Public expander API: expand_word, expand_word_to_fields, expand_command.

Thin glue between the executor and the lower-level expansion helpers. All allocation/error contracts are documented in expander.h; this file only implements them on top of the t_xbuf and helpers from expand_word.c, expand_parameter.c, expand_tilde.c and field_split.c.

Author

pulgamecanica

Functions

char *expand_word(t_shell *shell, const char *word)
static int word_has_quote(const char *word)

True if the raw token word contains at least one quote (

or " ),

skipping past backslash-escapes so</tt>\"` doesn't count.

Used to detect words like `””` or `”$X”(X unset): their expansion is empty bytes-wise, yet POSIX 2.6.5 requires one empty argv field to be emitted (soprintf “[%s]” “$X” foo prints[][foo], not[foo]`).

static char **one_empty_field(void)

Produce an argv array of exactly one empty string ({"", NULL}).

char **expand_word_to_fields(t_shell *shell, const char *word)
static void free_argv(char **argv)

Free a NULL-terminated argv array (each string + the array).

static int argv_append_all(char ***dst, int dst_count, char **src)

Append src (a NULL-terminated array) onto *dst, growing.

Takes ownership of src's elements; frees src itself. On failure both arrays are left in a recoverable state (src is freed, *dst keeps the elements collected so far so the caller can free it via free_argv).

Returns:

New element count of *dst, or -1 on failure.

static int expand_argv(t_shell *shell, t_cmd *cmd)

Replace cmd->argv with the field-split expansion of every word.

static char *expand_assignment(t_shell *shell, const char *original)

Rebuild “NAME=value” with value expanded as a single string.

static int expand_assignments(t_shell *shell, t_cmd *cmd)

Walk cmd->assignments and expand each value in place.

static int is_heredoc(t_token_type type)

True if type names a heredoc operator (whose target is the delimiter and must NOT be expanded as a filename).

static void report_ambiguous_redir(const char *target)

Report “<target>: ambiguous redirect” to stderr.

static char *expand_redir_target(t_shell *shell, const char *word)

Expand a redir target with field splitting and reject ambiguity.

After full expansion + field splitting, the target must collapse to exactly one field. Zero fields (e.g. an empty unquoted variable) and >1 fields are both rejected as ambiguous, matching bash behaviour.

Returns:

Newly-allocated single-field string on success, or NULL on allocation failure / ambiguity (error already reported).

static int expand_redirs(t_shell *shell, t_cmd *cmd)

Walk cmd->redirs and expand each non-heredoc target in place.

int expand_command(t_shell *shell, t_cmd *cmd)

Field Split

Split an expanded buffer on $IFS, honouring the split mask.

Field splitting (POSIX 2.6.5) only applies to bytes that came from an unquoted expansion - exactly the bytes whose mask byte is 1 in t_xbuf. Literal bytes (including spaces inside “…” or ‘…’) are preserved unconditionally.

Author

pulgamecanica

Splitting rules implemented (matching dash / bash):

  • Default IFS is “ \t\n”.

  • Leading IFS whitespace is dropped; a run of IFS whitespace between two values is one delimiter (no empty field generated).

  • Each non-whitespace IFS byte is a delimiter. IFS whitespace adjacent to it is consumed with it.

  • Adjacent non-whitespace IFS delimiters produce an empty field between them.

  • A trailing delimiter does NOT produce a trailing empty field.

  • IFS = “” disables splitting entirely (single field).

  • An empty buffer yields zero fields.

Functions

static int is_ifs_split(const t_xbuf *buf, size_t i, const char *ifs)

True if buf->data[i] is in ifs and the mask allows splitting.

static int is_ifs_ws(const char *ifs, char c)

True if c is one of POSIX’s three IFS-whitespace bytes AND it appears in ifs.

static size_t push_field(t_field **fields, size_t *count, size_t cap, t_field f)

Append a field record to a growing array.

Returns:

New capacity, or 0 on allocation failure (caller frees).

static void skip_ifs_ws(const t_xbuf *buf, const char *ifs, size_t *i)

Skip a run of IFS whitespace starting at *i.

static void consume_delimiter(const t_xbuf *buf, const char *ifs, size_t *i)

Consume one delimiter at *i: optional non-ws byte plus any surrounding IFS whitespace.

static t_field *collect_fields(const t_xbuf *buf, const char *ifs, size_t *out_count)

Walk the buffer and record each field.

Returns:

Field array (caller frees) or NULL on allocation failure. When the buffer expands to zero fields, *out_count is 0 and the returned pointer is NULL - that is not an error.

static char **materialise(const t_xbuf *buf, t_field *fields, size_t count)

Convert collected (start, len) records to a NULL-terminated array.

static char **single_field(const t_xbuf *buf)

Build a single-field array holding the buffer verbatim.

static char **empty_array(void)

Build a zero-field result (single NULL pointer).

char **field_split(t_shell *shell, const t_xbuf *expanded)
struct t_field

Field record (offset, length) into the expansion buffer.

We collect these first and only allocate the t_strdup-style char* array once we know the final count.

Public Members

size_t start
size_t len