MultiMarkdown Developer's Guide
(Revised 2026-03-13)
Introduction
Why MultiMarkdown v7?
The “initial public commit” for MultiMarkdown v6 was January 18, 2017. I had first started working on it in early 2016, however. v6 was almost a complete rewrite of v5, which still used a PEG. Because of that, there was a great deal of “learning while building”, and the code was sloppier than it should have been in some places.
The API was also not as clean as it could have been, which required including several different header files in order to parse text into HTML within another project.
Differences from MultiMarkdown v6
The code has been almost completely rewritten yet again. It is generally much cleaner and better organized.
The API is cleaner. A single header file (
libMultiMarkdown7.h) needs to be included and should have everything you need to use MultiMarkdown (MMD) in most other projects. I have used libMultiMarkdown inside a Swift project without any difficulty.Performance is better. Depending on the test content, it is significantly faster than v6. It is almost always faster than CommonMark (v 0.31.1), and at times gets somewhat close to MD4C. There are a few things to remember:
MultiMarkdown has more features than CommonMark and MD4C. Some of these will necessarily cause a performance hit when compared to not using those features (such as the more complex output for
# Headers #). Compatibility mode turns off most of those features, but in a few cases there will still be a slight performance cost even in compatibility mode.MultiMarkdown has to be capable of returning an AST in order to be used in some situations (e.g. as a syntax highlighter for MultiMarkdown Composer.) This means it will necessarily be slower than MD4C in all but the most trivial of test cases. That said, I include MD4C in the benchmark testing as something of a “lower bound” to shoot for.
The benchmark testing (in /dev) performs comparisons between MMD 6, MMD 7, CommonMark, and MD4C (if you have those installed). Additionally, it compares HTML, compatibility mode, and LaTeX output in MMD 7. (NOTE: MD4C installed via homebrew chokes on the
many-linkstest file. I don’t know why. Building MD4C yourself from source repo works fine.)
The syntax is generally the same, but there are edge cases that are handled slightly differently. To my knowledge, these represent intentional changes that I believe are more “correct.” At this time, I do not have an exhaustive list. The best way to see the differences is to compare the output from v6 and v7 on the same source file. The test suite has been updated to reflect these changes.
Testing
Updated Test Suite
The standard test suite files from prior versions of MMD have been updated to account for changes from the v7 parser, but also to include new test situations that have come up. Most of the changes are relatively minor.
make
cd build
make
ctest
ctest -V # To see details
Additionally, the “test harness” I use to perform integration testing has been updated to a bash script rather than using a very old Perl script. This allows the test suite to run on Windows without a Perl installation.
Unit Testing
In most of my projects, I try to make heavy use of unit testing and a Test-driven development approach.
MultiMarkdown is a bit different because unit testing would be very labor intensive, and it’s really the integration testing at the end that I care the most about. (I’m not saying that unit testing would be wrong, just that in this case I’m not sure it’s the approach I want to take.)
That said there are a few components that have some unit tests that can be run.
make test
cd build-test
make
./run_tests
Programmatically Generated Test Files
In addition to hand-generated test files in the test suite that have been built up over time, there are some computer generated tests as well. These allow more exhaustively testing situations that might not come up in regular use.
Emphasis Test– generated by_generate_test_emphasis.cto exhaustively test combinations of opening and closing strong/emphasis markers using both*and_structure.sh– shell script to generate exhaustive combinations of line types to verify parser behavior in determining the overall block structure. The script can be modified to determine how deep the testing goes. 3 line combinations cover the majority of common situations, but some additional edge cases arise the further you go. By the time you get to 5 levels, the generated HTML output is over 11 MB long. As of 2026–03–14, I have tested up to 6 levels deep (135.6 MB)../tests/structure.sh | ./build/multimarkdown -c | tidy > mmd.html ./tests/structure.sh | cmark | tidy > cmark.html diff mmd.html cmark.html(
tidycleans up the whitespace to reduce “false positives” when looking for differences between MMD and CommonMark. And to be clear, I am not implying that CommonMark gets everything correct in this sort of a test. But John MacFarlane has, as usual, done a great job of being consistent and precise. One change that I made based on this test was to revisit my interpretation of how lazy blockquotes interact with other block level elements in Markdown. I had previously assumed (decided??) that lazy blockquote formatting (leaving out the initial ‘>’) applied to any blockquote content. Based on CommonMark’s interpretation, I reread the Markdown syntax “spec” and agreed that it only applied to regular text paragraphs.)
libFuzzer
libFuzzer is used for fuzz testing. Through the course of development, I was able to find a fair number of bugs this way that would have been challenging to find otherwise. This doesn’t work on macOS, so I use vagrant and do the fuzz testing in Ubuntu. Feel free to participate by running the fuzz tester yourself, and send me any examples that trigger an error!
cd fuzz
make
cd build
make
./fuzz_mmd-7
Performance Benchmarking
bench.c builds a small test program that generates a collection of test
files that stress test a MMD parser with a few different scenarios.
(from https://gist.github.com/mity/24822b24d35ef1f998f970965f8c8e53)
It then parses those files several times with MMD v6 and v7, CommonMark, and
MD4C. bench.c will in all likelihood need to be modified to match how
these programs are installed on your machine.
make
cd build
make
cd ../dev
make run
API Changes
API Calls
libMultiMarkdown7.h defines the API for interacting with the MultiMarkdown 7
library. I have tried to clean this file up in order to make it clearer to
read and to included everything required to incorporate MMD in most
projects.
You’ll notice that most of the primary API calls have 4 variants:
One requires a FILE pointer to a file that has been opened. This can also be
stdin.One requires path to a file, and MMD handles opening the file for reading.
One requires a null terminated C string (which means the string has to be scanned to determine how long it is.)
One requires a C string (optionally null terminated) along with the length of that string (in bytes). This version does not require an additional pass to determine the length of the string since it is provided up front. This variant is preferable to variant 3 if you already know the length of the string for that reason.
Regardless of how the source text is delivered, MMD expects UTF-8 encoding (with or without a BOM, which is not needed with UTF-8 encoding).
There are several different call classes available:
mmd_process_X– source MMD text is fuly processed into another format (e.g. HTML) and sent to the desired FILE pointer (e.g.stdout)mmd_process_X_to_str– same as above, but instead the result is returned as achar *which represents the output string (or binary data for some formats). The length of the output is placed in* out_len. The returnedchar *will need to be freed.mmd_parse_X– these functions return the AST in the form of ammd_nodetree. This can be used as you need, and then cleaned up withmmd_node_tree_free(). You will need to pass a pointer to aread_ctx. Between the returned AST and the updatedread_ctxyou have everything you need to do whatever you like with the parsed information, though must also have access to the original source text if you want to extract the text that a node points to, for example.mmd_ast_X– this is a shortcut function that parses source text and sends a description of the AST tostdout(or another FILE pointer) without requiring you to know aboutmmd_node. Alternatively, you can usemmd_process_Xwith an output format ofast.mmd_hash_X– similar tommd_ast_Xbut includes hash values for the nodes. Alternatively, you can usemmd_process_Xwith an output format ofhash. I’m still working on some ways to use these, but the idea is that the hash values for each node in the tree allow you to quickly determine whether two AST’s (or subtrees of the same) are the equivalent (identical hash values) or not.mmd_metadata– process MMD source text and return the* read_ctx(which must be freed after use withread_ctx_free().) This allows you to access metadata from a MMD document, along with other extracted information. Currently, you would needread_ctx.hin order to do much with this, but I plan to update this aspect of the API in the future.custom_seed_rand()– several MMD features use pseudo-random numbers to prevent collisions between footnote and header anchor ids. This function must be called in order to do the initial seeding for the random number generator so that different numbers are generated each time.
API Enumerations
libMultiMarkdown7.h also includes the various enums that are used.
output_format– specifies which output format is desired (one at a time)smart_quote_language– specify which localization to use for single and double quote pairs, etc. If I am missing something, please let me know!language– some features in MMD (e.g. footnotes) generate English text in the final HTML file (such as “see footnote”.) Let me know if you have another language to contribute.mmd_options– bitwise flags for controlling various features of MMD. SeelibMultiMarkdown7.hfor details of these options.
All of these values are combined into a single 32-bit unsigned integer. There are couple of macros to extract specific values from that combined value if needed:
MMD_OUT_FORMAT_FROM_OPTS()MMD_SMART_QUOTE_FROM_OPTS()MMD_LANGUAGE_FROM_OPTS()
Abstract Syntax Tree
The AST consists of mmd_node structs, which specify a type of node, the
starting offset in the source text (in bytes), the length (in bytes), and
pointers to the next node and the first child node. The tail node points to
the last childe node and is primarily used when building the AST so that a
new child can be appended without walking the linked list.
mmd_line_node is the same but adds two more fields specifying where the
actual “content” of the line starts and ends, which allows you to more easily
ignore the markup, such as the leading and trailing ## in a header.
node_type is a value from 1 - 255 that specifies what a specific mmd_node
represents. Values from 1–63 represent LINES in the source text. Values from
64–127 represent block level structures. Values from 128–255 represent span
level tokens.
NOTE: If you customize MMD and add additional node types to the
enumeration list, be sure assign to the proper value range and follow any
directions in the comments of libMultiMarkdown7.h.
There are several utility macros to help easily determine what grouping a
specific mmd_node belongs to based on its type:
MMD_NODE_IS_LINE()MMD_NODE_IS_BLOCK()MMD_NODE_IS_TOKEN()
Command-Line Changes
MMD v7 handles arguments from the command-line in a slightly different way from v6, though the most common use cases are unchanged.
multimarkdown [–help] {ast|batch|hash|meta|parse} [options] [Input file names]
The first argument should be an action. If no action is specified, MMD
defaults to the parse action.
ast– Display the AST showing how the document was parsedbatch– Parse each file individually and write to the same filename with a new extensionhash– Display the hash tree for the parsed documentmeta– Extra metadata from the document without parsing the restparse– Parse the document(s) and export to the desired format
You can then specify different options to adjust the default behavior:
-t FORMAT– Specify the output format (html,mmd,latex,docx,epub,itmz,opml,textbundle,textpack,ast,hash)-o OUT_FILE– Specify the filename for writing the output-l LANGUAGE– Specify the language for smart quotes and default markup (en,es,de,fr,nl,sv,he)-c– Markdown compatibility mode – turn off features that are not in the original Markdown specification. Note that due to bugs in the originalMarkdown.pland due to ambiguities in the original “spec”, the output will not exactly match John Gruber’sMarkdown.pl.-C– Generate a complete document-S– Generate a snippet-r– Enable file transclusion (“recursive”)-D– Download assets from the internet (images, CSS) for inclusion in package formats (requires libcurl when compiling)-E– Embedd assets in non-package formats where possible (e.g. embed images directly in HTML as Base64)-p PATH– Specify a working directory when parsing from stdin (e.g. for transclusion or embedding assets)-A– Accept all CriticMarkup changes-R– Reject all CriticMarkup changes-O– Convert OPML source to MMD text before parsing-I– Convert iThoughts source to MMD text before parsing-e– Specify metadata key to extract value from MMD source-b– Limit parsing to block level only (useful for diagnostics only)-s– Log some processing time statistics (does not work on Windows)
Cross-Platform Compatibility
macOS
Primary development for MMD is done on macOS, so everything works.
*nix
Additional compilation and testing is frequently done on Ubuntu Linux via a Virtualbox VM, so everything should work properly on *nix machines.
Windows
I was finally able to get a minimal development environment working on a Windows VM using UTM and Windows 11. It is not tested as regularly as macOS or *nix environments, but it works and passes the test suite.
However, performance is not what I would have expected. It’s possible this is because running Windows inside macOS imparts too much of a performance hit? (Yet running on an Ubuntu VM is just fine…) I worry it’s something more integral to the code and that Windows needs more time profiling to help improve things. As I do not have Windows hardware, and have no intention on purchasing Windows hardware, any contributions here are appreciated!
Others
MultiMarkdown is written in C, and is intended to be able to compiled on any (reasonable) operating system.
The only external library it uses is libcurl, but only if it is found.
CMake is normally used as the build system, but MMD could be compiled by manually specifying the source files to be compiled of course. So this is not a hard requirement either.
If you need to compile MMD on a different system and find that I have done something that prevents you from doing that, let me know and I will consider whether anything can be changed.
Continuous Integration
I use Github’s actions to run the test suite with every push to the public repository. This includes:
- compiling on macOS, Ubuntu, and Windows with
clangandgcc - running
cteston all 3 operating systems
At the very least, this warns me if I push a commit that breaks compilation or expected output on the test suite on any of the three systems.
Contributing
Bug Reports and Suggestions
I welcome examples of source text that causes MMD to misbehave. You can contribute them at the Github issues page.
Same thing with suggestions for new features. Be warned, however, that I rarely add new features to the MMD syntax unless they are truly valuable to a wide range of users.
Code Contributions
Pull requests can be managed through Github. However, if your request is more than a straightforward bug fix, it is unlikely that I will accept it directly. It is more likely that I would rewrite the suggested code to ensure it matches the style and structure of existing code and my expectations. So you may be better off discussing the idea with me first before worrying too much about a pull request.