Inland Cursor

Vibe Coding an Emacs Package: Where It Succeeds and Where It Falls Short

Table of Contents

  1. Abstract
  2. Background
  3. Motivation
  4. Game Rules
  5. The Amazement
  6. The Frustration
    1. LLMs' Limits
    2. Lack of Control
    3. Lack of Growth
  7. Reflection
    1. It May Solve Your Problem
    2. It Still Requires Human Intervention
    3. Programming for Growth and Fun

Vibe coding an Emacs package to solve my personalized problem, and reflections for it.

Abstract

Vibe coding may solve your problem in the following three scenarios:

  1. You have no idea of programming, but want to develop a program to solve your personalized problem.
  2. You are a programmer, but are venturing into an unfamiliar domain.
  3. You are developing small-scale or disposable program.

However, it may also make things more difficult when something goes wrong. Also, if you care about learning through programming, vibe coding can't help much. LLMs should still be used as a guide and accelerator, not as a replacement.

Background

When ChatGPT first launched, I was skeptical about its usefulness for real programming. At that time, LLMs were weak at logical deduction and mathematics. They had limited context windows, making handling large amounts of information difficult. And of course, there was hallucination – I still remember the disappointment when a book the model recommended turned out not to exist.

Years have passed, and many of those problems have improved significantly. More and more people now use LLMs in production-level engineering(Claude, GitHub Copilot, etc.). Even so, I didn't use LLMs in my day-to-day work – until recently.

Motivation

I use Emacs and org-roam to implement my personal notes system – my Zettelkasten. Several Emacs packages may serve similar purposes, such as denote or howm. I tried them and settled on org-roam. But I liked a feature in howm that org-roam lacks: searching a keyword across all notes and previewing each matching result so we can see the context and the note it lives in.

That sounds like a grep job, and consult-grep should do it. However, I name my org-roam note files by unique identifiers like 20250910180700.org rather than the title. I do this because note titles may contain characters not supported by the filesystem, and it avoids renaming the file if I rename the note title. org-roam's database has drawbacks, but compared with denote's use of filenames to hold metadata, I prefer this extra level of indirection.

All problems in computer science can be solved by another level of indirection except for the problem of too many layers of indirection.

grep and consult-grep were my first attempt, but their results only show filenames – meaningless datetime strings in my case – so I couldn't tell which note contained the keyword.

One benefit of Emacs is that you can (almost) always roll up your sleeves and scratch your own itch as long as you are capable. Unfortunately, I am not that capable: I know a bit of Lisp from reading SICP, but I'd never written any package-level Emacs Lisp. I read the ripgrep documentation and realized this would require focused time, not a short wait while a compiler runs.

Wait a minute – what about trying an LLM? I'd been reading about vibe coding and non-programmers building apps with LLMs. I'd also been chatting with ChatGPT a lot, which increased my confidence enough that I wanted to try programming with it.

Game Rules

I followed the Wikipedia definition of vibe coding:

The developer describes a project or task to a large language model (LLM), which generates code based on the prompt. The developer does not review or edit the code, but solely uses tools and execution results to evaluate it and asks the LLM for improvements.

So I would not write, edit, or examine a single line of code. I only used natural language to describe goals, report errors, and give feedback based on execution results.

The LLM used was ChatGPT 5.

The Amazement

At first, ChatGPT surprised me by understanding my description and implementing features correctly. It made minor mistakes, but after I reported an error or incorrect behavior, it quickly found the cause and produced a patch.

In two hours, it implemented the following features from scratch:

  1. Use grep or ripgrep to search a given directory for a keyword and display results in a dedicated buffer.
  2. When moving the cursor among matches in that buffer, open a preview window showing the file content at the matched line, with the matching line and the keyword highlighted.
  3. Group matching results by file, and name each group after the #+title if the file is an .org file with that keyword; otherwise fall back to the filename.

I respected the game rules faithfully. Being unfamiliar with Emacs Lisp and Emacs development, I could hardly improve the code myself. At that point, I had a viable package that met my initial requirement and was practically useful. I was so satisfied that I created a GitHub repository for it and uploaded the code.

The Frustration

Then, I got ambitious and tried to push farther. I wanted to add:

  1. Support for multiple keywords with logical OR/AND operators.
  2. Live preview when typing keywords.
  3. Integration with vertico.

Unfortunately, implementing these features proved more difficult than the first set. I ran into three main sources of frustration.

LLMs' Limits

The first source of frustration is obvious: LLMs can't achieve everything I asked for, and they sometimes can't recover from their own errors. The project then stopped there forever.

As the program grew more complex, asking the LLM to generate the whole package led to more syntax errors – especially mismatched parentheses in Emacs Lisp. It's impossible to fix such problems without inspecting the code. Reporting syntax errors back to ChatGPT often didn't help; it sometimes kept producing syntactically incorrect code while claiming it was correct. Luckily, I know basic Lisp syntax and could fix these issues manually. For non-programmers, that kind of fix would be impossible.

The first two additional features did get implemented after about five hours of iteration, but only with a lot of repetition and struggle. Most of the time I was just repeating the same problem to ChatGPT until it finally produced a correct implementation. The third one never reached a perfect state: behavior was correct but performance was poor. I asked ChatGPT to improve performance using timers or asynchronous processes, but it couldn't do so without breaking existing features. After about two days of fighting, I gave up and admitted LLMs can't solve every problem – even ones known to be solvable.

Lack of Control

The problems that LLMs can't solve led to the second source of frustration: I can't solve these problems as well.

Vibe coding is all good while things go well: LLMs generate working code and you can ask for features or fixes. But when things go wrong, the human must check everything to find the cause of the problem. Did I mis-copy a piece of code? Was the code incorrect? Were the LLM and I even on the same page? Under the vibe coding rule, I couldn't examine the code to answer these questions. Even if I broke the rule, working on a LLM-generated codebase would be much more difficult than building it from the ground up myself. Convenience came with a price.

All I could do was to report that "things were off" and hope the LLM fixed it next time. It felt like standing by a gambling machine, hoping for a win. Because I didn't understand the construction of the machine – in this case, the LLM and the program it produced – I was only counting on sheer luck. Programming attracts me because programmers have (almost) full control over programs; vibe coding eliminates it.

Lack of Growth

The third, and final source of frustration, is the lack of growth.

After I gave up on polishing the third feature to a perfect state and stopped development. I personally had not grown any more than the one who started the vibe coding. Vibe coding did yield some useful outcomes, but I have not learned anything. I was still the one who knew almost nothing about Emacs package development and I could have learned something if I read the code generated by LLM. If you are not a programmer, or don't care about learning programming, or are just developing a simple, disposable script, this would not be an issue. However, for anyone who cares about growth and is trying to build software that serves for a long time, this cannot be ignored.

Reflection

Having tried vibe coding, I have several thoughts.

It May Solve Your Problem

It's amazing that you can instruct an LLM to implement ideas and iterate using execution feedback. And if the LLM is capable of solving your problem, then it is fine to have it do so. If you are not a programmer but are trying to develop software to solve a personalized problem, do try it. Alternatively, if you are a programmer venturing into an unfamiliar domain, or are developing a small-scale or disposable program, it may suit your needs.

It Still Requires Human Intervention

However, once things go wrong, you may have no way to fix them. Without knowing the implementation, you can't solve problems the LLM cannot. A human expert may still be required to solve the problem, and they may have to work from the ground up again because the LLM-generated code may be difficult to read, edit, and maintain. This is a major reason I remain doubtful about fully automatic code generation. Which is more time-consuming: solving the problem ourselves, or verifying a solution offered by AI? To me, it feels like the P vs NP problem.

Programming for Growth and Fun

Even if future LLMs become perfect oracles, vibe coding still deprives us of the fun of solving problems ourselves. It's similar to Go or chess: even if AI can easily defeat the most competent human player, we still play for the joy of the activity. (This is one reason we build AI: to free us from repetitive labor and let us do things for their own sake.)

What's more, although vibe coding produced a usable package that solves my problem, I didn't grow much during the process. I learned nothing. This makes me consider LLM-aided coding far more valuable than pure vibe coding. LLMs help beginners and programmers venturing into unfamiliar domains get hands-on quickly. We don't need to learn every basic first: programming is an art you learn by doing. LLM-aided programming can be an excellent teacher tailored to your needs. (Though you still have to be cautious about its answers and instructions!)

As a result, I am planning to rewrite the package. This time, I won't let the LLM write all the code. I may ask for architecture and targeted implementations, but I'll write – or at least review – the code so I understand how it works and maybe learn a trick or two.

#English