Bashing Bash

15 Jul, 2022

I recently realized that my bash scripts tend to rot with every single line I add. The snippet below explains it well:

#!/usr/bin/env bash
# this script to be ran every 30 minutes

for f in $(find -name "*.json"); do
    lockfile="$(basename $f).lock"
    if [ -f "$lockfile" ]; then
        continue
    fi

    touch "$lockfile"
    # exposes $USERNAME and $PASSWORD
    ./json_to_envvars.py "$f"
    # backgrounding so we can do many actions in paralell
    ./do_action.sh "$USERNAME:$PASSWORD" &
    rm -f "$lockfile"
done

There are multiple pain points here.

The for loop will bork on filenames with whitespace.
- there are workarounds using null byte and such; but who remembers what flags to use? Is it -z or -0 for find or xargs??? Nobody but man(1) knows
The subshell doesn't have proper quoting, this is the correct way
- ```
  lockfile="$(basename "$f")" # yes, I'm vomiting
```
Parsing json in bash is painful; so I tend to shell out to either a helper script (so we depend on python being installed, great) or jq (that questions my sanity after a while)
What happens if any of these commands fail? Should we abort? Design team? Hello?

The single biggest mistake made here, however, is using the file system as the state. I see this tendancy all the time in my scripts. Imagine trying to debug this mess when the whole file system is your state machine. I understand why containers are so hot; they restrict persistent state to volumes.

I thought I was clever when I got my college degree. Yet here I am, writing scripts like these for a living. $lockfile might not even be writable by the current process. Shit.

Also, if the parameter to ./do_action.sh is to be kept confidential, you're screwed by lurking ps aux cowboys. Such a silly thing, imagine using command line parameters for passing secrets. When people say C has footguns, they forgot to mention that those footguns are still present in other shapes in bash

What's the remedy? Just move to a real language. That way, you avoid

for-loop shenanigans
worrying about quoting every single instance of every variable in existance
shelling out to a million different utility-scripts and nullifying every co-workers will to live after trying to follow the program flow
a whole class of bugs related to writing state to the file system
leaking confidential information, just by existing

The same logic, implemented in python:

#!/usr/bin/env python3

with open('locks.json') as f:
    locks = json.load(f)

def lock(file):
    locks[file] = True

def unlock(file):
    locks[file] = False

for file in glob.glob("*.json", recursive=True):
    with open(file) as f:
        envvars = json.load(f)

    if status[file]["lock"]:
        continue

    lock(file)
    arg = f'{envvars["username"]}:{envvars["password"]}'
    do_action(arg, background=True)
    unlock(file)

with open('locks.json') as f:
    f.write(json.dumps(locks))

Now that I look at it, it might actually be the case that I am not a great system designer quite yet. Because the python code above instead has race conditions if multiple instances of the main script is invoked. BUT! My talking points still stand as the rock in brock:

for-loop shenanigans ✅ no more
worrying about quoting every single variable 🙅‍♂️ never again
shelling out to a million different utility-scripts 🐖 how about i smell ya later instead
a whole class of bugs related to writing state to the file system 📁 ok but this was a design quirk, I just learned
leaking confidential information just by existing 🤫 python function calls are very quiet by default

That leads me to thinking, how do you design a system to be resilient to dead-locks, whilst allowing parallel processing? The file system is inherently paralellizable (just write to another .lock-file) and has a single namespace.

Unless we daemonize the python script?

#!/usr/bin/env python3

with open('locks.json') as f:
    locks = json.load(f)

while True:
    next_run = datetime.now() + timedelta(minutes=30)

    try:
        do_the_for_loop(locks)
    except Exception as e:
        save_locks(locks)
        raise e
    
    save_locks(locks)
    pause.until(next_run)

That's hot. I really like this architecture more. The state is written to disk on each run, successful or not. The script can be service-ified by systemd; so that it is restarted when it dies.

Is this how a design session usually goes? Cause I need more of those on $MY_COMPANY then. Also, remember that I've only worked with corporate software since september last year; I'm still learning becoming a code monkey