Day 21: Checking if a string contains shell wildcard (String::Wildcard::Bash)

About the series: perlancar's 2014 Advent Calendar: Introduction to a selection of 24 modules which I published in 2014. Table of contents.

A few months ago, during work with shell tab completion, I wanted a routine to know if a string contains a wildcard. For example, if I have this on the command-line buffer: emacs f*txt and press Tab, shells will usually be helpful and show me the list of filenames matching that wildcard. So I wanted to test whether f*txt contains wildcard and if yes then di a glob() on it, but otherwise I'll do a simple search for the files on the directory (actually you can just try and glob() it anyway without caring if something contains a wildcard, or perhaps do a glob("$arg*") just to make sure that something will contain a wildcard and thus won't confuse glob(), but somehow I didn't like that easy approach). And as I was sure there will be other cases where I'll need this routine (but can't do a glob() because there are no real files to check), I went ahead and wrote the routine anyway. The result is String::Wildcard::Bash. (To be complete, I also wrote String::Wildcard::DOS and String::Wildcard::SQL.)

I managed to find a single regex to do this, that is good enough for my needs (you can view the source of the String::Wildcard::Bash). Be warned that it's quite toothpicky.

Detecting bash wildcard presence is not as simple as detecting DOS and SQL wildcards, because there are other things to check aside from the existence of the joker characters (* and ?) themselves. First you have a backslash mechanism:

foo*bar    # has wildcard
foo\*bar   # no wildcard
foo\\*bar  # has wildcard

Then you get character classes, which I'd categorize as wildcard too:

f[aeiou]o  # has wildcard
f\[aeiou]o # no wildcard

And you also have brace expansion, which I also categorize as wildcard. Note that other wildcards like joker and character classes are expanded within braces (since bash expands braces first, and then expands jokers). And another quirk for the brace is that you need to have at least two items (need a comma) for this to be expanded by the shell; otherwise it's just literal. Examples:

a{}      # no wildcard, brace is literal
a{a}     # no wildcard, brace is literal
a{a,b}   # "has wildcard", brace is expanded into: aa ab
a{a*,b?} # has wildcard, brace is expanded and then the jokers are expanded
a\{a,b}  # no wildcard, brace escaped

Aside from the abovementioned wildcard, bash does other types of expansions/substitutions too, but these are not considered as wildcard. These include tilde expansion (e.g. ~ becomes C), parameter and variable expansion (e.g. $0 and $HOME), arithmetic expression (e.g. $[1+2]), history (!), and so on.

Leave a comment

About perlancar

user-pic #perl #indonesia