Last year I set out to build something simple: a web editor where you describe a document in plain English, and AI writes the LaTeX for you. Twelve months, three rewrites, and thousands of user compilations later, I have data that surprised me—and changed how I think about AI-assisted writing.

This isn’t a product pitch. It’s a post-mortem on what actually happens when you let a large language model write LaTeX, and the engineering decisions that turned a 32% first-compile success rate into 94%.

The Problem Nobody Talks About: AI-Generated LaTeX Doesn’t Compile

Every demo of “AI writes your paper” shows the happy path: user types prompt, AI produces beautiful document. What they don’t show is the error log.

When I first wired Claude into a LaTeX compilation pipeline, 68% of AI-generated LaTeX failed on first compile. Not because the AI was bad at LaTeX—it’s actually remarkably good at document structure. The failures were almost always one of three things:

Top 3 AI-Generated LaTeX Failures

Error Type% of FailuresExample
Missing packages41%Uses \usepackage{tikz} but forgets pgfplots for plots
Unmatched environments33%\begin{align} with no \end{align} (truncated output)
Invalid commands in context26%\textbf inside math mode, \href without hyperref

The missing package problem was the worst. LLMs know thousands of LaTeX commands but have fuzzy knowledge of which package provides which command. Ask for a chemical formula and you get \ce{H2O}—correct syntax, but it won’t compile without mhchem. Ask for a circuit diagram and you get perfect circuitikz code that requires a package the AI forgot to include.

The Auto-Fix Loop: How We Got to 94%

The breakthrough wasn’t making the AI write better LaTeX. It was feeding compilation errors back to the AI and letting it fix its own mistakes.

Here’s the actual pipeline:

  1. User sends prompt (“Write a paper about quantum computing”)
  2. AI generates LaTeX document
  3. Server runs pdflatex (or lualatex for Unicode)
  4. If compilation fails, error log is sent back to AI with the instruction: “Fix the LaTeX errors. Here’s the log.”
  5. AI generates corrected version
  6. Repeat up to 2 times

Compilation Success Rates

AttemptCumulative SuccessNew Fixes
First compile32%
After 1st auto-fix81%+49%
After 2nd auto-fix94%+13%

Based on production data from Artitex compilations. “Success” = pdflatex exits with code 0.

The jump from 32% to 81% on the first auto-fix is dramatic. That’s because 41% of failures were missing packages—and when you show the AI “! LaTeX Error: File \textquotesingle{}mhchem.sty\textquotesingle{} not found”, it immediately adds \usepackage{mhchem}. It’s a trivial fix that the AI just needed the signal to make.

The second auto-fix catches the harder cases: nested environment mismatches, conflicting packages, and edge cases in math mode. After two attempts, the remaining 6% are usually documents that need packages we don’t have installed (like exotic fonts or specialized scientific packages).

Crucially, auto-fix attempts are free for users. We don’t charge credits for error correction—only for the initial generation. This was a deliberate choice: penalizing users for AI mistakes kills trust.

Three AI Models, One Editor: What the Tiers Actually Do

We offer three Claude models at different price points. Here’s what we learned about when each one matters:

Model Performance by Task

TaskHaiku (Speed)Sonnet (Balanced)Opus (Quality)
Simple letter/memo98% success99%99%
Research paper structure76%91%96%
Complex math equations54%82%93%
TikZ diagrams31%67%84%
Multi-section thesis42%78%91%

First-compile success rates before auto-fix. All models reach 90%+ after auto-fix for most tasks.

The takeaway: model choice matters most for complex documents, but auto-fix equalizes them for simple ones. If you’re writing a quick letter, Haiku is indistinguishable from Opus after auto-fix. If you’re generating a thesis with equations and TikZ diagrams, Opus saves you auto-fix cycles and produces more coherent structure.

This is why we give free users access to Haiku with auto-fix rather than a limited version of Sonnet. For 80% of use cases—homework, short reports, CVs—Haiku + auto-fix delivers the same outcome.

What Overleaf Gets Wrong About AI

Overleaf is excellent software. I use it. But their approach to AI (and most “add AI to X” products) makes a fundamental mistake: they bolt AI onto an existing workflow instead of designing the workflow around AI.

In Overleaf, AI is an autocomplete feature. You write LaTeX, and AI suggests the next line. This is useful but limited—it assumes the user already knows LaTeX well enough to evaluate suggestions. That’s backwards for the people who need AI most: beginners who can’t write LaTeX at all.

Artitex inverts this: the user writes in English, AI writes LaTeX, and the editor handles compilation. The user never needs to see or edit LaTeX directly (though they can—we show the source and let you modify it).

This difference shows up in user behavior. In our chat logs, the most common second message after a document is generated is a refinement:

  • “Make the introduction longer”
  • “Add a section about methodology”
  • “Change the font to Times New Roman”
  • “Add page numbers in the footer”

These are instructions a non-LaTeX user would never be able to act on in Overleaf without Googling for 20 minutes. In a chat-driven editor, they’re one sentence.

The Four Format Problem

We support LaTeX, Markdown, AsciiDoc, and rich text. This wasn’t the plan—the plan was LaTeX only. But user feedback was clear: people wanted the AI writing assistant for Markdown too, and for documents that didn’t need LaTeX at all.

Supporting four formats created an interesting engineering challenge. Each format has different image embedding syntax:

LaTeX:    \includegraphics[width=0.8\textwidth]{image.png}
Markdown: ![Caption](image.png)
AsciiDoc: image::image.png[Caption,width=80%]
HTML:     <img src="image.png" alt="Caption">

The AI needs to know which format the user is working in, and generate the right syntax. We pass format context with every chat message. This sounds simple but caused weeks of bugs where the AI would generate Markdown image syntax in a LaTeX document because a previous message mentioned “markdown” in passing.

Version History: Why We Track AI vs Manual Changes

Every AI-generated change creates a version. Every manual edit you save creates a version. But we tag them differently: ai vs manual.

This matters because AI changes are high-variance. The AI might rewrite your entire document when you asked it to “fix the spacing.” Having labeled versions means you can always roll back to the last known-good state, whether that was your manual edit or a previous AI generation.

We store up to 50 versions per document. That sounds like a lot, but a typical document goes through 8–15 AI interactions, each creating a version. Adding manual saves, 50 versions covers a realistic editing session.

The Credit System: Honest Pricing in a World of “Unlimited AI”

Many AI products advertise “unlimited” usage and then throttle you, degrade quality, or go bankrupt. We chose transparent, token-based pricing.

Our costs are real: Claude API calls cost money per token. We apply a 10x markup on raw API costs (which covers infrastructure, compilation servers, storage, and margin). The free tier gives 5 credits per day—enough for 2–3 documents.

Why 10x? Because the value isn’t just the API call. It’s the compilation pipeline, the auto-fix loop (which makes additional API calls for free), the PDF rendering, the version history, and the format-specific export. Stripping those out and just using ChatGPT to generate LaTeX works—but then you need Overleaf to compile it, a terminal to debug errors, and git to track versions. We bundle the entire workflow.

What I’d Do Differently

If I were starting over:

  1. Start with one format. Supporting four formats quadrupled the testing surface. LaTeX + Markdown would have covered 95% of users.
  2. Build auto-fix from day one. We added it in month 3 after seeing the failure rates. It should have been the first feature after basic generation.
  3. Don’t fight the AI’s instincts. Early on we had elaborate prompt templates to force specific document structures. Simpler prompts with good system instructions produce better results because the AI can use its training rather than following rigid templates.
  4. Invest in error messages. When LaTeX compilation fails, the raw log is incomprehensible to beginners. We now parse error logs into human-readable messages, but this was an afterthought that should have been a priority.

What’s Actually Hard About This

The AI part isn’t the hard part. Claude writes good LaTeX. The hard parts are:

  • LaTeX compilation infrastructure. Running pdflatex in a container, managing temp files, handling timeouts for infinite loops, installing the right packages without bloating the image to 5GB.
  • Streaming. The AI generates tokens one at a time. Showing that stream in a chat interface while simultaneously preparing to compile the final result is a concurrency challenge.
  • Context management. A 20-message conversation about a document contains a lot of noise. Sending all of it to the AI wastes tokens and confuses the model. Sending too little loses context. We dynamically size the context window based on message relevance.
  • PDF rendering in the browser. Compiled PDFs need to display in real-time. PDF.js works but is heavyweight. Iframe-based preview is simpler but has CORS issues.

Try It

Artitex is free to use with 5 AI credits per day. No credit card required. If you write academic papers, theses, or technical documents, give it a try and tell me what breaks—I’m still fixing things.

If you’re building something similar, I’m happy to talk architecture. The auto-fix loop pattern is applicable to any AI + compiler pipeline (code generation, SQL, CSS, etc.).