Expander
Word expansion: quotes, parameter, tilde, field splitting.
The expander is invoked by the executor right before a simple command runs. It performs the POSIX expansion pipeline on the raw token strings produced by the lexer/parser:
- Author
pulgamecanica
Tilde expansion ~ ~user
Parameter expansion $VAR ${VAR} $? $$ $0
Quote handling ‘ “ \ semantics
Field splitting on $IFS (only on unquoted expansion output)
Quote removal done in-line during the char-by-char loop
Command substitution $(), arithmetic $(()) and pathname (glob) expansion are listed as modular features in plan/04_expander.md and are not implemented in this module yet.
Functions
-
char *expand_word(struct s_shell *shell, const char *word)
Expand a word to a single string (no field splitting, no globbing).
Used for assignment values and redirection targets.
- Parameters:
shell – The shell instance.
word – The raw token string from the parser.
- Returns:
Newly-allocated string. Caller frees. NULL on allocation failure or when
wordis NULL.
-
char **expand_word_to_fields(struct s_shell *shell, const char *word)
Expand a word to multiple fields (argv words).
Performs all expansions plus field splitting on $IFS.
- Parameters:
shell – The shell instance.
word – The raw token string from the parser.
- Returns:
Newly-allocated NULL-terminated array of strings. Caller frees each element and the array itself. May return a zero-element array (just a NULL terminator) when the word expanded to nothing splittable. NULL on allocation failure or when
wordis NULL.
-
int expand_command(struct s_shell *shell, t_cmd *cmd)
Expand a whole simple command in place: argv, assignments, redirs.
Replaces
cmd->argvwith a new array containing the expanded fields, updatescmd->argc, rewrites each “NAME=value” incmd->assignments, and rewritesredir->targeton every non-heredoc redirection.- Parameters:
shell – The shell instance.
cmd – The command node to expand.
- Returns:
0 on success, -1 on allocation failure.
-
int xbuf_putc(t_xbuf *buf, char c, char split)
Append one byte plus its split-mask flag.
- Parameters:
buf – The buffer to append to.
c – The character to append.
split – 1 if
cis subject to IFS splitting, 0 if literal.
- Returns:
0 on success, -1 on allocation failure.
-
int xbuf_puts(t_xbuf *buf, const char *s, char split)
Append a NUL-terminated string with a uniform split flag.
- Parameters:
buf – The buffer to append to.
s – The string to append (may be NULL → no-op success).
split – 1 if every byte of
sis splittable, 0 if literal.
- Returns:
0 on success, -1 on allocation failure.
-
int expand_word_into(struct s_shell *shell, const char *word, t_xbuf *out)
Run the char-by-char expansion loop on
wordintoout.Handles single quotes (no expansion), double quotes (parameter expansion only), tilde at word start, $-expansions, and backslash escapes per POSIX.
- Returns:
0 on success, -1 on allocation failure.
-
int expand_dollar(struct s_shell *shell, const char *input, size_t *pos, int dq, t_xbuf *out)
Read a $… sequence beginning at
input[*pos].Supports $?, $$, $0, $NAME and ${NAME} (including ${?}, ${$}, ${0}). Unknown variables expand to the empty string. A bare $ followed by no recognised form is emitted literally.
- Parameters:
shell – The shell instance (for variable lookup and $?).
input – The full word string.
pos – In/out: byte index, advanced past the consumed sequence.
dq – 1 when the $-sequence is inside double quotes (suppresses split).
out – Buffer to append the expanded value to.
- Returns:
0 on success, -1 on allocation failure.
-
int expand_tilde_at(struct s_shell *shell, const char *input, size_t *pos, t_xbuf *out)
Read a tilde sequence beginning at
input[*pos].Recognises a leading “~” or “~user” and replaces it with $HOME or the named user’s home directory. If neither resolves, the original text is emitted literally.
- Returns:
0 on success, -1 on allocation failure.
-
char **field_split(struct s_shell *shell, const t_xbuf *expanded)
Field-split an already-expanded buffer on $IFS.
Honours the buffer’s split mask: a character is a candidate delimiter only when both (a) it appears in $IFS and (b) its mask byte is 1. Empty $IFS disables splitting entirely.
- Returns:
Newly-allocated NULL-terminated array of strings, or NULL on allocation failure.
-
int expand_arithmetic(struct s_shell *shell, const char *input, size_t *pos, int dq, t_xbuf *out)
expand arithmetic expression $((expr))
check the depth of double parenthesis and call arith_eval
- Returns:
if OK 0 - else -1
-
int expand_cmdsub(struct s_shell *shell, const char *input, size_t *pos, int dq, t_xbuf *out)
Expand a
$( ... )command substitution.Forks a child whose stdout is captured via a pipe; the inner script runs through the same lexer/parser/executor as a normal command line. Trailing newlines are stripped per POSIX 2.6.3. Unquoted (
dq== 0) output is pushed with split=1 so field splitting still runs.- Returns:
0 on success, -1 on allocation / unmatched-paren errors.
-
int arith_eval(const char *expr, long long int *result)
Evaluate a plain arithmetic expression string to a long long int.
Implements a recursive descent parser for the following grammar: expr -> term ((‘+’ | ‘-’) term)* term -> factor ((‘*’ | ‘/’ | ‘’) factor)* factor -> ‘(’ expr ‘)’ | [‘-’] NUMBER Operator precedence and left-associativity are handled naturally by the call chain. Whitespace between tokens is ignored.
- Returns:
0 on success or -1
-
long long int parse_expr(t_arith *a)
Parse and evaluate the top-level arithmetic expression.
Handles additive operators (‘+’ and ‘-’) with left-to-right associativity by repeatedly calling parse_term() for each operand. Stops as soon as a non-additive token is encountered or the end of the string is reached. Short-circuits immediately if a->error is set by a deeper call, leaving the position unchanged so arith_eval() can detect and report trailing garbage.
- Returns:
The computed value of the expression parsed
-
struct t_arith
- #include <expander.h>
Helper structure to handle arithmetic expression.
-
struct t_xbuf
- #include <expander.h>
Expansion buffer with a parallel “splittable” mask.
Phase 1 of expansion writes into a t_xbuf instead of a plain C string so that field splitting can later distinguish IFS characters that came from an unquoted expansion (splittable) from IFS characters that were literal-quoted in the source word (not splittable).
For every byte stored in
datathere is a parallel byte inmask:1 means “IFS-splittable”, 0 means “literal - never a field boundary”.Note
Both buffers are NUL-terminated so the payload can be inspected with the regular string functions.
Expander
Public expander API: expand_word, expand_word_to_fields, expand_command.
Thin glue between the executor and the lower-level expansion helpers. All allocation/error contracts are documented in expander.h; this file only implements them on top of the t_xbuf and helpers from expand_word.c, expand_parameter.c, expand_tilde.c and field_split.c.
- Author
pulgamecanica
Functions
-
char *expand_word(t_shell *shell, const char *word)
-
static int word_has_quote(const char *word)
True if the raw token
wordcontains at least one quote (or " ),
skipping past backslash-escapes so</tt>\"` doesn't count.
Used to detect words like `””` or `”$X”
(X unset): their expansion is empty bytes-wise, yet POSIX 2.6.5 requires one empty argv field to be emitted (soprintf “[%s]” “$X” fooprints[][foo], not[foo]`).
-
static char **one_empty_field(void)
Produce an argv array of exactly one empty string (
{"", NULL}).
-
char **expand_word_to_fields(t_shell *shell, const char *word)
-
static void free_argv(char **argv)
Free a NULL-terminated argv array (each string + the array).
-
static int argv_append_all(char ***dst, int dst_count, char **src)
Append
src(a NULL-terminated array) onto*dst, growing.Takes ownership of
src'selements; freessrcitself. On failure both arrays are left in a recoverable state (srcis freed,*dstkeeps the elements collected so far so the caller can free it via free_argv).- Returns:
New element count of
*dst, or -1 on failure.
-
static int expand_argv(t_shell *shell, t_cmd *cmd)
Replace cmd->argv with the field-split expansion of every word.
-
static char *expand_assignment(t_shell *shell, const char *original)
Rebuild “NAME=value” with
valueexpanded as a single string.
-
static int expand_assignments(t_shell *shell, t_cmd *cmd)
Walk cmd->assignments and expand each value in place.
-
static int is_heredoc(t_token_type type)
True if
typenames a heredoc operator (whose target is the delimiter and must NOT be expanded as a filename).
-
static void report_ambiguous_redir(const char *target)
Report “<target>: ambiguous redirect” to stderr.
-
static char *expand_redir_target(t_shell *shell, const char *word)
Expand a redir target with field splitting and reject ambiguity.
After full expansion + field splitting, the target must collapse to exactly one field. Zero fields (e.g. an empty unquoted variable) and >1 fields are both rejected as ambiguous, matching bash behaviour.
- Returns:
Newly-allocated single-field string on success, or NULL on allocation failure / ambiguity (error already reported).
Field Split
Split an expanded buffer on $IFS, honouring the split mask.
Field splitting (POSIX 2.6.5) only applies to bytes that came from an unquoted expansion - exactly the bytes whose mask byte is 1 in t_xbuf. Literal bytes (including spaces inside “…” or ‘…’) are preserved unconditionally.
- Author
pulgamecanica
Splitting rules implemented (matching dash / bash):
Default IFS is “ \t\n”.
Leading IFS whitespace is dropped; a run of IFS whitespace between two values is one delimiter (no empty field generated).
Each non-whitespace IFS byte is a delimiter. IFS whitespace adjacent to it is consumed with it.
Adjacent non-whitespace IFS delimiters produce an empty field between them.
A trailing delimiter does NOT produce a trailing empty field.
IFS = “” disables splitting entirely (single field).
An empty buffer yields zero fields.
Functions
-
static int is_ifs_split(const t_xbuf *buf, size_t i, const char *ifs)
True if
buf->data[i] is inifsand the mask allows splitting.
-
static int is_ifs_ws(const char *ifs, char c)
True if
cis one of POSIX’s three IFS-whitespace bytes AND it appears inifs.
-
static size_t push_field(t_field **fields, size_t *count, size_t cap, t_field f)
Append a field record to a growing array.
- Returns:
New capacity, or 0 on allocation failure (caller frees).
-
static void skip_ifs_ws(const t_xbuf *buf, const char *ifs, size_t *i)
Skip a run of IFS whitespace starting at *i.
-
static void consume_delimiter(const t_xbuf *buf, const char *ifs, size_t *i)
Consume one delimiter at *i: optional non-ws byte plus any surrounding IFS whitespace.
-
static t_field *collect_fields(const t_xbuf *buf, const char *ifs, size_t *out_count)
Walk the buffer and record each field.
- Returns:
Field array (caller frees) or NULL on allocation failure. When the buffer expands to zero fields, *out_count is 0 and the returned pointer is NULL - that is not an error.
-
static char **materialise(const t_xbuf *buf, t_field *fields, size_t count)
Convert collected (start, len) records to a NULL-terminated array.
-
static char **single_field(const t_xbuf *buf)
Build a single-field array holding the buffer verbatim.
-
static char **empty_array(void)
Build a zero-field result (single NULL pointer).
-
struct t_field
Field record (offset, length) into the expansion buffer.
We collect these first and only allocate the t_strdup-style char* array once we know the final count.