Authoring blog posts in Obsidian

I’m using Gitea, Drone, and Hugo to watch for commits to my Obsidian vault, extract blog posts, and publish them to one of my servers. I run my stuff on Digital Ocean droplets, and I use Caddy for serving static sites.

Why does it work?

it’s cheap, fast, and simple. Self-hosting means I have more control over what gets published. This could all be accomplished with Github Actions, but I’d have to have separate vaults/repositories for public vs private content or I’d have to just make all my notes public.

Why doesn’t it work?

My original selection of pipeline images and commands was inefficient, incurring unnecessary network traffic and relying on third party package mirrors that suddenly started performing very badly.

Another important detail is media: the directory structure for my Obsidian vault and my site are very different.

I want to write blog posts with screenshots, media files, and more. Obsidian lets you drag and drop attachments, or link them manually with links in the form ![[path/to/attachment.png]]

Finally, Hugo is a great static site generator, but there are better options when you’re looking to publish content authored in Obsidian. In particular, the graph view is something that I’d love to bring into my blog. Luckily, Quartz is built directly on top of Hugo and comes with a theme and some helper utilities

What are the options?

The Requirements

  • attachment links must be transformed from ![[attachments/whatever.png]] to ![[notes/post-name/whatever.]]
  • the site must be built with Quartz instead of Hugo

The first choice is to whether I “fix” this during authoring, or during the publishing step. For the former, my options look something like this:

  1. manually typing the final URL into the note
  2. creating a complicated template system for generating Hugo shortcodes. in my head, this would use a prompter to let me select what attachment i want to insert, ask for resizing parameters, etc, and then generate a Hugo shortcode or an <img> tag.

None of these are satisfactory to me. I’d love to just drag and drop a piece of media into my note inside Obsidian and simply not have to think about it any further.

This leaves implementing something during the publishing pipeline. Now that I’ve got my drone pipeline working, it’s the perfect place to do transformations. This path presents a variety of possibilities falling on a spectrum somewhere between a bash script invoking sed and a custom ( Golang ) program that parses frontmatter, markdown, and applies pre-configured transformations.

Quartz

The Quartz repo has a few built-in options for turning your notes into a website: a Dockerfile, a Makefile, and instructions on how to build everything from scratch. All of these are great, and I played with them all at different times to figure out which was a good fit.

Pipelines: More than meets the eye

Unsurprisingly, I opted to extend my existing Drone pipeline with a transformer. This part of the pipeline has been in the back of my mind since the beginning, more or less, but it was much more important to get things stable first.

The pipeline I’m finally satisfied with looks like this, with checked boxes indicating what I had implemented at the start of this phase of the project.

  • Create a temporary shared directory, /tmp/blog
  • Clone the vault repository
  • do a submodule update and use git-lfs to pull down attachments
  • clone my forked Quartz repository into /tmp/blog
  • Copy posts from $VAULT/Resources/blog/post-name.md to /tmp/blog/content/notes/post-name/index.md
  • Scan all index.md files in /tmp/blog/content/ for links that look like ![[attachments/whatever.png]], find whatever.png and copy it into the /tmp/blog/content/notes/post-name/ directory for that index.md.
  • Scan all index.md files in /tmp/blog/content/ for links that look like ![[attachments/whatever.png]] and edit them to ![[notes/post-name/whatever.png]]
  • Run the Quartz build command
  • Copy the static site to destination web server

Hours and hours of debugging pipelines later

Drone Volumes

The linchpin of this whole operation is having a temporary workspace that all these tools can operate on in sequence. To that end, I used Drone’s Temporary Volumes to mount /tmp/blog in all the relevant pipeline steps.

Creating a temporary volume looks like this. I really couldn’t tell you what temp:{} is about, it certainly looks strange but I never had the spare cycles to investigate.

1
2
3
volumes:
- name: blog
  temp: {}

Once you’ve created the volume, a pipeline step can mount it to a desired path. See below for an example of using your created volume.

Quartz

Forking Quartz was easy, I’d done so late last year during another attempt to get this blog off the ground.

After a merge to get my fork up to date with upstream, I was able to slot this into the pipeline with the following.

1
2
3
4
5
6
7
- name: clone-quartz
  image: alpine/git
  volumes:
  - name: blog
    path: /tmp/blog
  commands:
  - git clone -b hugo https://github.com/therealfakemoot/quartz.git /tmp/blog

This sets the stage for building the site; this sets the stage for a step I implemented previously: !copy-posts-checkbox-screenshot.png

I opted to stop committing content to a blog repository and cloning the static site skeleton into the pipeline for a few reasons:

  1. I already have reproducibility by virtue of building things with docker and having sources of truth in git.
  2. It was an unnecessary layer of complexity
  3. It was an unnecessary inversion of control flow

Configuring Quartz had its rocky moments. I’ve had to wrestle with frontmatter a lot, confusing TOML and YAML syntaxes can break your build or break certain features like the local graph.

Gathering Media

This step ended up being pretty fun to work on. I took the opportunity to write this in Go because I knew I could make it fast and correct.

The process is simple:

  1. Walk a target directory and find an index.md file
  2. When you find an index.md file, scan it for links of the form [[attachments/whatever.png]]
  3. Find whatever.png in the vault’s attachments directory and copy it adjacent to its respective index.md file.

walkFunc is what handles step 1. You call err := filepath.Walk(target, walkFunc(attachments)) and it will call your walkFunc for every filesystem object the OS returns.

This piece of code checks if we’ve found a blog post and then chucks it to scanReader.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
func walkFunc(matchChan matches) filepath.WalkFunc {
	return func(path string, info fs.FileInfo, err error) error {
		if err != nil {
			return nil
		}
		if info.IsDir() {
			return nil
		}
		f, err := os.Open(path)
		if err != nil {
			return err
		}

		if strings.HasSuffix(path, "index.md") {
			scanReader(f, path, matchChan)
		}
		return nil
	}
}

scanReader iterates line-by-line and uses a regular expression to grab the necessary details from matching links.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
type Attachment struct {
	Filename string
	Note     string
}

type matches chan Attachment

func scanReader(r io.Reader, path string, matchChan matches) {

	log.Printf("scanning markdown file: %s", path)
	pat := regexp.MustCompile(`\[\[(Resources\/attachments\/.*?)\]\]`)

	s := bufio.NewScanner(r)
	for s.Scan() {
		tok := s.Text()
		matches := pat.FindAllStringSubmatch(tok, -1)
		if len(matches) > 0 {
			log.Printf("media found in %s: %#+v\n", path, matches)
			for _, match := range matches {
				dirs := strings.Split(path, "/")
				noteFilename := dirs[len(dirs)-2]
				log.Println("noteFilename:", noteFilename)
				matchChan <- Attachment{Filename: match[1], Note: noteFilename}
			}
		}
	}
}

Finally, moveAttachment receives a struct containing context ( the location of the index.md file and the name of the attachment to copy ) and performs a copy.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
func moveAttachment(att Attachment, dest string) error {
	destPath := filepath.Jon(dest, strings.Split(att.Note, ".")[0])
	log.Println("moving files into:", destPath)
	_, err := copy(att.Filename, filepath.Join(destPath, filepath.Base(att.Filename)))
	return err
}

func copy(src, dst string) (int64, error) {
	sourceFileStat, err := os.Stat(src)
	if err != nil {
		return 0, err
	}

	if !sourceFileStat.Mode().IsRegular() {
		return 0, fmt.Errorf("%s is not a regular file", src)
	}

	source, err := os.Open(src)
	if err != nil {
		return 0, err
	}
	defer source.Close()

	destination, err := os.Create(dst)
	if err != nil {
		return 0, err
	}
	defer destination.Close()
	nBytes, err := io.Copy(destination, source)
	return nBytes, err
}

This ended up being the most straightforward part of the process by far. I packed this in a Dockerfile , using build stages to improve caching.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
FROM golang:latest as BUILD

WORKDIR /gather-media

COPY go.mod ./
# COPY go.sum ./

RUN go mod download

COPY *.go ./

RUN go build -o /bin/gather-media

Integration into the pipeline is here:

1
2
3
4
5
6
7
- name: gather-media
  image: code.ndumas.com/ndumas/gather-media:latest
  volumes:
  - name: blog
    path: /tmp/blog
  commands:
  - gather-media -target /tmp/blog/content/notes

Full code can be found here.

Link transformation ended up being pretty trivial, but it took way way longer than any of the other steps because of an embarassing typo in a find invocation. Another Docker image, another apperance of the blog volume.

The typo in my find was using contents/ instead of content/. My code worked perfectly, but the pipeline wasn’t finding any files to run it against.

1
2
3
4
5
6
7
- name: sanitize-links
  image: code.ndumas.com/ndumas/sanitize-links:latest
  volumes:
  - name: blog
    path: /tmp/blog
  commands:
  - find /tmp/blog/content/ -type f -name 'index.md' -exec sanitize-links {} \;

sanitize-links is a bog-standard sed invocation. My original implementation tried to loop inside the bash script, but I realized I could refactor this into effectively a map() call and simplify things a whole bunch.

The pipeline calls find, which produces a list of filenames. Each filename is individually fed as an argument to sanitize-links. Clean and simple.

1
2
3
4
5
6
#! /bin/sh

echo "scanning $1 for attachments"
noteName=$(echo $1|awk -F'/' '{print $(NF-1)}')
sed -i "s#Resources/attachments#notes/$noteName#w /tmp/changes.txt" $1
cat /tmp/changes.txt

Lots of Moving Pieces

If you’re reading this post and seeing images embedded, everything is working. I’m pretty happy with how it all came out. Each piece is small and maintainable. Part of me worries that there’s too many pieces, though. gather-media is written in Go, I could extend it to handle some or all of the other steps.

!drone-builds-screenshot.png

For the future

Things I’d like to keep working on

  • include shortcodes for images, code snippets, and the like
  • customize the CSS a little bit
  • customize the layout slightly

Unsolved Mysteries

  • What does temp: {} do? Why is it necessary?