Extending LLM Capabilities with Model Context Protocol, Part 3

MCPs solve real problems—but they are far from perfect. In the first post on Model Context Protocol I laid out the general purpose and value of MCPs, along with some issues you might encounter. In the second post I showed a useful example and focused on the utility, current immature functionality, and likely future of MCPs. This time we’ll focus on the development side: the choices that shaped the Tab Generator, the unexpected problems that emerged, and what I learned about making MCPs that actually work with LLMs.

Understanding MCP development reveals four important realities:

While the MCP standard itself is relatively simple, the business logic required to solve real problems can be arbitrarily complex.
MCPs are traditional code solving the traditional problems that LLMs fundamentally cannot address.
The documentation you write for an LLM matters as much as the code itself—a fundamentally different paradigm than traditional API design.
Even perfectly functioning MCPs cannot guarantee LLM behavior, which has serious implications for anyone building production systems with them.

MCPs Start Simple

Developing an MCP is pretty much like generating a single-purpose server application. For most MCPs the actual interfaces are quite simple. This is necessary because complex interfaces, like complex APIs, will be hard for LLMs to properly understand and use.

The basic architecture for MCPs is simpler than the documentation might seem to indicate. You need to create a server that can expose MCP-compatible interfaces with associated documentation. Frameworks exist to handle most of the boiler plate work.

Claude can easily generate a basic MCP to get you started. In the case of the Tab Generator I asked Claude to create the code to generate basic tabs for guitars. It was able to quickly generate working code! LLMs have good success rates when working with new code for small apps.

The development challenge for these servers is not really in the MCP but three other, more traditional areas:

Your business logic, the code you must write to solve the particular problem the MCP addresses.
Your non-functional requirements (security, scaling, observability, etc.)
The tech stack you use, including languages, libraries, build systems, tool chain, source code control, configuration, hosting, etc.

I spent quite a bit of time figuring out how to structure the project, build it, configure it, and host it (more on this below). Someday, probably soon, much of this will be contained in wizards that can guide you through the process.

Figure 1: I’m experimenting with generating infographics using Google’s NotebookLM. In this case I don’t think the illustration showing the MCP layers is particularly informative, but the ability to generate these infographics is yet another impressive LLM capability I will spend more time with.

MCPs Are Traditional Code

The process of developing an MCP is that of developing a traditional server, using common languages and solving normal coding and technical stack issues. There are plenty of sources you can find that tell you how to build an MCP. Here, I’ll give an overview of the process I went through, and what I learned from it.

Coding Choices

Most MCPs are created in either JavaScript (or TypeScript), or Python. I chose Python because that is where I have been spending most of my time recently. While developing the MCP I did get the impression that most people chose JavaScript, resulting in more support for that language and ecosystem.

I decided to use FastMCP for Python, which is related to FastAPI–both are frameworks that quickly allow you to easily develop server-like functionality in Python. Claude is familiar with the framework and can easily generate and modify simple FastMCP implementations.

While typing is still not Python’s strong suit, I wanted to ensure that I developed this MCP using best practices. I implemented the Pydantic system as this works well for FastAPI and FastMCP, and provides parameter checking for the interfaces. Claude seemed reasonably well versed in the technology so could generate initial code that leveraged Pydantic. But it was (like so much AI coding) uneven in how well it applied the technology, missing some useful circumstances in which type checking should be utilized.

I started with very basic tab generations requirements. With the assistance of Claude, and the use of the Python libraries, the basic code took only a day to flesh out to good form. The various extensions took quite a bit longer to integrate–as previously mentioned Claude works best off of new, clean code bases. Note that I used baseline Claude not Claude Code for this project. From what I hear Claude Code might have been able to handle the refactoring, debugging, and overall style better than Claude itself.

# Initialize FastMCP server
mcp = FastMCP("Tab Generator")

@mcp.tool()
def generate_tab(tab_data: str) -> TabResponse:
   """
    Generate UTF-8 tablature for stringed instruments from structured JSON input.

Figure 2: This small snippet, showing the main interface for the tab generator, illustrates many of the points I am making about the simplicity of MCPs, along with the use of FastMCP, Pydantic, and docstrings as context.

Over time I worked with Claude to extend the functionality, adding support for additional stringed instruments, timing, techniques, and other advanced functionality. This is where LLMs can struggle. Claude sometimes nailed the enhancements and bug fixes, but often it struggled to understand and modify existing code–even if it originally generated the code. In general it did a poor job creating clean, modular code off of an existing code base. It used anti-patterns creating functions and files with prefixes like NewYYY and EnhancedYYY, rather than refactoring code inline. Most of the week or so I worked on this project was spent rewriting poorly structured code and fixing tricky bugs that Claude could not understand.

Debugging The Code

If you’ve used AI tools to assist with your coding you know that debugging can be a mixed bag. In some instances the model can tell you exactly how to fix your code. In other situations it will lead you on a wild goose chase as it confidently insists you try multiple fixes that don’t address the issue. This work was no different.

I have yet to find a model that actually steps through the code in a debugger, noting results, values, and changes until it finds the issue. To this point in time they have consistently asked me to print intermediate results and have used that information to try to ascertain issues. This is a slow process with mixed results. I have to believe that at some point LLMs will be able to do the debugging for themselves.

The debugging problem is exacerbated when the LLM has generated hundreds of lines of code, leaving you to debug without a deep knowledge of the codebase and its foibles. Ironically, debugging the code forces you to develop that deeper knowledge of the codebase.

Debugging the Technology

As always with modern software systems, quite a bit of your setup and debugging time with an MCP will be spent dealing with the environment, tools, libraries, systems, changing technologies, and updated releases. Modern coding moves fast. At this early stage for MCPs, they move quicker still! I spent at least as much time working through issues outside of the code as I did inside of the code, to ensure that my MCP could reliably function, launch, be hosted, get detected, and generate data. As one technology (library, tool, or environment) updated it created a cascade effect on the whole system, forcing me to revisit other aspects of my project.

A best practice is to ensure that you are consulting the latest versions of documentation for the systems you plan to use. When looking at coding examples, pay close attention to their environment, libraries, and tool chains–these may well not be up-to-date. You will likely need to either update them to the latest versions or determine how you can install older versions for compatibility.

I expect that most MCPs will be set up by way of wizards on hosted systems in the near future, hiding much of this complexity from developers. The wizards can take care of the particulars for the configuration, while the hosted system keeps itself up-to-date with the latest advances. This is how this blog is maintained, hosted as a WordPress site, which rarely requires manual intervention. Instead of blog articles developers will provide the business logic, and the hosting system will handle all of the system’s issues.

Documentation Is Key

While the coding was relatively easy, the documentation was surprisingly involved and important. Think about how an LLM works: You give it context which it uses to figure out what to generate. The better the context (the prompting), the better the output. For FastMCP and Python the docstring comments associated with the interfaces are the documentation (or context). FastMCP ensures that they are returned to the LLM when it requests information about the interfaces as part of its MCP discovery process.

With poor or incomplete context the LLM won’t know how to properly leverage the code. Documentation must specify:

the input and output formats (the JSON payloads)
intentions and usage of the code
special cases
error conditions
examples of input and output formats for various scenarios

All the tricks and techniques involved in prompting apply when creating MCP documentation. LLMs are surprisingly effective at improving documentation meant for themselves. Ask Claude to critique your docstrings for clarity, identify missing edge cases, or suggest better examples. It will often catch ambiguities that would confuse it during actual MCP calls.

I used this iteratively with the Tab Generator–Claude would suggest adding format examples for time signatures, clarifying which parameters were optional, or restructuring sections for better discoverability. Not every suggestion was good (it occasionally proposed overly verbose examples or redundant explanations), but the collaborative process consistently improved the documentation’s effectiveness.

You must keep your documentation up-to-date with your code or LLMs will not properly find and use new functionality. As you add more documentation for your MCP’s various use cases the documentation may lose organization and impact. Like the code itself, Claude struggled with larger, more involved documentation, which required some manual work on my part to edit and reformat the text. The age-old issue of keeping documentation in sync with the code has reached the AI era!

Documentation is so important that even a small app like the tab generator might end up with hundreds of lines surrounding each interface that is exposed to the LLM.

Testing Trick: A Fresh Context

One trick for testing the documentation, and the MCP itself: Spin up a new chat session (context) to test your MCP and the documentation that guides the LLM. If you use a context in which you’ve been working for a while the LLM may have ‘learned’ much about the MCP from overall context. But in a new chat session it will need to rely solely on the documentation–this is what you really need to test.

LLMs are Probabilistic Machines

Remember the nature of LLMs: All of your carefully thought out documentation intended to guide your LLM to a successful MCP interaction is really just more context to influence the probabilities. Despite your explicit instructions the LLM may end up ignoring this information and do something else.

In the case of the tab generator, where I repeatedly emphasize that the tabs must be rendered closely based upon their specification, Claude would sometimes ‘decide’ to reformat them or add extra information (or not even show them), frustrating my attempts to generate legible tabs.

Figure 3: MCPs and other tools add useful functionality to LLMs, but are not perfect. (Another NotebookLM infographic).

From Dev To Production

The Tab Generator MCP worked well enough, hosted locally on my desktop. The code executed flawlessly, the formatting was exact, the validation caught errors. Occasionally I would encounter challenges on the LLM side where Claude would forget to call it, fail to display the results, or modify the formatting and ruin the alignment. Usually I could fix this by pointing out the error and asking it to re-perform the task.

This local setup worked as long as my computer was on, the server was running, and I was sitting at my desk. For a personal project, that might be sufficient. But for something I wanted to use from multiple devices, share with others, or demonstrate reliably, I needed to solve the hosting problem. That decision—local versus cloud deployment—turned out to involve its own set of tradeoffs and lessons

In the final post, I’ll walk through the hosting decision, show you how to actually run the Tab Generator yourself, and wrap up what this entire journey revealed about building with LLMs.

Addendum: The Tab Generator MCP

For those interested in seeing what the Tab Generator MCP looks like, it is public on github. Of particular interest is the main MCP server, in mcp_server.py. You only have to scroll down to line 71 to see all 500 lines of ‘documentation as context for LLMs’.