Bookmarking, the UNIX way

This one is inspired (once again) by the undeepfaked mental outlaw. See the original youtube vid here. The written-on-the-fly, proof-of-concept demo script looks something like the following:

Bookmark="$(xclip -o)"
File="$HOME/.local/share/bookmarks"

if grep -q "^$Bookmark" "$File"; then
	notify-send "Oops." "Already bookmarked!"
else
	notify-send "Bookmark added!" "${Bookmark#https://} is now saved."
	echo "\"$Title\" $Bookmark" >> $File
fi

As Luke explained in the vid, xclip -o prints the copy selection to standard out so it’s like echo-ing from the clipboard. The if block checks whether bookmark already exists and acts accordingly. I striped https:// away because it takes too much space on the notify-send banner for nothing informational (assuming one isn’t visiting http sites anymore, hopefully).

The script will start to come short when the bookmarks pile up. Searching for sites in dmenu (or whatever fuzzy finder you are using) won’t be easy when you only have URLs. It would be great if site titles (string inside the <title> html tag) are prepended.

Boy do things get bloated!

Clearly, I was naive to be looking for a couple of lines of bash to pull site titles, if not some glorious one-liners. In reality, lines add up pretty quickly due to:

  • some sites dispatch compressed data instead of html text file;
  • some sites break lines within a title tag;
  • some sites have multiple title tags for some reason and the code below grabs “the first one”;
    • caveat here: you’d be amazed some sites would have code like this:
      <!--	<title></title>  -->
              <title>ACTUAL STIE TITLE</title>
      so the script actually grabs the first nonempty title tag.
  • html encoding for special characters should be decoded.

The case against parsing html with regular expressions rings true here. Hence the non-automated part of the script: a dialog --inputbox asking for a customized title, if desired.

The full script

#!/bin/sh

File="$HOME/.local/share/bookmarks"

TrimString() {
    : "${1#"${1%%[^[:space:]]*}"}"
    : "${_%"${_##*[^[:space:]]}"}"
    printf '%s\n' "$_"
}

HTML2Text () {
    : "${1//&nbsp;/ }"
    : "${_//&mdash;/-}"
    : "${_//&amp;/&}"
    : "${_//&lt;/<}"
    : "${_//&gt;/>}"
    : "${_//&quot;/\"}"
    : "${_//&#39;/\'}"
    : "${_//&#039;/\'}"
    : "${_//&ldquo;/\"}"
    : "${_//&rdquo;/\"}"
    : "${_//&raquo;/>>}"
    printf '%s\n' "$_"
}

Bookmark="$(xclip -o)"
RawTitles="$(wget --compression=auto -qO- "$Bookmark" | tr '\n' ' ' | grep -oP "<title>.*?</title>" | sed -n 's:<title>\(.*\)</title>:\1:p')"
while IFS= read -r Title; do
	[[ -n $Title ]] && break
done < <(printf '%s\n' "$RawTitles")
Title="$(TrimString "$(HTML2Text "$Title")")"
if grep -q "$Bookmark" "$File"; then
	notify-send "Oops." "Already bookmarked!"
else
	User=$(dialog --inputbox "Bookmarked as \"$Title\" unless otherwise specified below:" 15 40 --output-fd 1);
	[[ -n $User ]] && Title="$User";
	notify-send "Added: $Title" "${Bookmark#https://} is now saved."
	echo "\"$Title\" $Bookmark" >> $File
fi

Some bash magics

For people who are not so familiar with dylanaraps’s pure-bash-bible on GitHub, the terse and esoteric-looking TrimString function is copied verbatim to trim leading and trailing white-spaces from string. Generally these operations fall under a category called shell parameter expansion in bash’s parlance.1

TrimString() {
    : "${1#"${1%%[^[:space:]]*}"}" # remove leading white spaces
    : "${_%"${_##*[^[:space:]]}"}" # remove trailing white spaces
    printf '%s\n' "$_"
}

"#" and "%" are bash built-ins to remove substrings at the begining and the end of a string. They have a greedy mode too, "##" and "%%", when paired with wildcard character "*". 2

The colon command ":" and the special bash variable "$_" which expands to the last argument to the previous command work hand in hand here to BTFO temporary variables.

How? If we look at the first line, for example, "${1#"${1%%[^[:space:]]*}"}" expands a shell parameter and yields a string. Neither the expansion nor the resulting string is a shell command in itself. But it becomes one if the colon command is called, which does nothing further than taking a shell parameter as its argument and expanding it. "$_" then catches the expansion in the next line.

HTML2Text function uses bash’s string pattern replacement feature to decode HTML entities:

${parameter/pattern/string} # only the first match is replaced
${parameter//pattern/string} # all matches are replaced
${parameter/#pattern/string} # must match at the beginning of the expanded value of parameter
${parameter/%pattern/string} # must match at the end of the expanded value of parameter

I’ve also found the last one useful for bulk file extension changes. An example could be bulk codec conversion of lossless music files, in which case a shell parameter expansion ${FILE/%.[:alnum:]+/.opus} can be handy.

Command for keybinding

Since now we have titles prepended, URLs become the last column/field hence awk '{print $NF}'. And I guess xdotool key Return is pretty self-explanatory.

xdotool type $(grep -v '^#' ~/.local/share/bookmarks | dmenu -i -l 50 | awk '{print $NF}') && xdotool key Return

  1. A comprehensive reference by GNU project can be found here↩︎

  2. Luke has a tutorial video for "#" and "%"↩︎