Command Palette

Search for a command to run...

Back to all guides
Web & Security

HTML to Markdown Conversion: The Complete Guide

Learn how to convert HTML to Markdown and vice versa. Covers common conversion challenges, tools, libraries, use cases for CMS migration, and automation techniques.

8 min readPublished April 1, 2025

Why Convert HTML to Markdown?

HTML to Markdown conversion is a critical task in content management, CMS migration, documentation workflows, and web scraping. Markdown is a lightweight markup language that uses plain-text formatting syntax, making it easier to read, write, and version-control than HTML. When migrating content from one CMS to another — such as moving from WordPress to Ghost, from Drupal to Hugo, or from an old enterprise platform to a modern static site generator — converting HTML content to Markdown is often the most practical approach because Markdown is the native format for many modern platforms.

Markdown files are plain text, which means they work seamlessly with Git version control, enabling teams to track content changes alongside code changes. They are also portable across platforms, future-proof, and readable without any special software. Converting HTML to Markdown strips away framework-specific markup, inline styles, and editor artifacts, leaving clean, semantic content that can be re-styled and re-themed for any destination platform. You can use our HTML to Markdown Converter to quickly transform your HTML content into clean Markdown syntax.

Conversion Rules for HTML Elements

Headings

HTML heading tags (h1 through h6) are converted to Markdown by replacing them with hash symbols. An <h1> becomes a single hash followed by a space (# Heading), an <h2> becomes two hashes (## Heading), and so on through <h6> which becomes six hashes (###### Heading). This mapping is straightforward and preserves the document hierarchy. However, some HTML documents may use CSS classes or spans for visual heading effects rather than semantic heading tags, and these non-semantic approaches may require manual adjustment after conversion.

Paragraphs and Text Formatting

HTML paragraphs (<p> tags) are converted to Markdown paragraphs, which are simply blocks of text separated by blank lines. Inline text formatting maps as follows: <strong> and <b> become double asterisks (**bold text**), <em> and <i> become single asterisks (*italic text*), <del> and <s> become double tildes (~~strikethrough~~), and <code> becomes backticks (inline code). Nested formatting like bold italic is represented by combining the syntax (***bold italic***). These conversions are reliable because Markdown was designed to cover the most common inline formatting needs.

Links and Images

HTML anchor tags (<a href="url">text</a>) are converted to Markdown link syntax [text](url). The title attribute, if present, can be included as [text](url "title"). Image tags (<img src="url" alt="text" />) are converted to Markdown image syntax ![alt text](url). These conversions preserve both the destination URL and the descriptive text or alt attribute, maintaining accessibility and SEO value. Relative URLs are kept as-is, while absolute URLs remain unchanged, making the converted content ready to use on the target platform.

Lists

Unordered lists (<ul> with <li> items) convert to Markdown unordered lists using hyphens or asterisks. Nested lists are represented with indentation. Ordered lists (<ol> with <li> items) convert to numbered lists using 1. notation. One important consideration is that Markdown does not natively support list types like disc, circle, or square, or start numbering at arbitrary values (in standard Markdown), so some HTML list attributes may be lost during conversion. Nested lists require careful indentation — typically two or four spaces — to render correctly in the destination Markdown parser.

Tables

Tables are one of the most complex HTML elements to convert to Markdown because Markdown's table syntax (supported through extensions like GitHub Flavored Markdown, CommonMark, and MultiMarkdown) is more limited than HTML's <table> structure. Markdown tables use pipe characters (|) for columns and hyphens (-) for header separators. Cell merging (colspan and rowspan), nested tables, and complex cell content with multiple paragraphs are not supported in standard Markdown table syntax and require workarounds such as HTML passthrough or restructuring the content.

Forms and Interactive Elements

HTML forms, input elements, buttons, and other interactive components do not have Markdown equivalents. When converting content that contains forms, you have several options: embed the raw HTML within the Markdown document (many Markdown parsers allow inline HTML), replace the form with a link to an external form, or note the form's presence as a placeholder for manual recreation. This is a common edge case in CMS migration where HTML content contains elements that Markdown cannot represent natively.

Tools and Libraries for Conversion

Turndown.js

Turndown.js is the most popular JavaScript library for converting HTML to Markdown. It runs in both the browser and Node.js, making it versatile for client-side tools, build scripts, and server-side processing. Turndown.js uses a plugin-based rule system that maps HTML elements to Markdown syntax, and you can customize or extend these rules for your specific needs. It handles headings, paragraphs, lists, links, images, code blocks, and most common HTML elements out of the box. Its popularity means a large community, extensive documentation, and many community plugins for handling edge cases.

Pandoc

Pandoc is a command-line tool and Haskell library that describes itself as a "universal document converter." It supports conversion between dozens of markup formats, including HTML to Markdown, Markdown to HTML, and many others. Pandoc is particularly powerful for batch processing large numbers of files, handling complex documents with metadata, and producing output in various Markdown flavors. It is available on all major operating systems and can be integrated into CI/CD pipelines and build scripts. Pandoc's handling of tables, footnotes, and metadata is more sophisticated than most JavaScript libraries.

Remark and the Unified Ecosystem

Remark is a Markdown processor built on the Unified ecosystem, which provides a powerful AST (Abstract Syntax Tree) manipulation framework. While remark is primarily used for processing Markdown, the broader ecosystem includes rehype for HTML processing, allowing bidirectional conversion between HTML and Markdown. This approach is ideal for Node.js-based build pipelines and static site generators like Gatsby and Astro. The AST-based approach gives you fine-grained control over the conversion process, enabling custom transformations that go beyond simple element-to-Markdown mapping.

CMS Migration Use Cases

CMS migration is the most common scenario requiring HTML to Markdown conversion. When moving content from WordPress to a static site generator like Hugo, Jekyll, or Astro, the source content is typically stored as HTML in a database. Converting this HTML to Markdown front-matter files allows the content to be managed as plain text, versioned with Git, and processed by the new platform's build system. This migration typically involves exporting the database, parsing the HTML content fields, converting each field to Markdown, and generating the appropriate file structure with front-matter metadata.

Other common migration scenarios include moving from Drupal to Ghost, from Medium to a self-hosted platform, or from an enterprise CMS to a headless CMS with Markdown-based content editing. Each scenario may involve additional considerations like preserving custom shortcodes, handling embedded media, migrating metadata and taxonomies, and maintaining URL structures for SEO. Testing the conversion with a representative sample of content before processing the entire dataset is essential to catch edge cases and format-specific issues early.

Markdown to HTML: The Reverse Conversion

While this guide focuses on HTML to Markdown, the reverse conversion — Markdown to HTML — is equally important. Every Markdown parser performs this conversion, and understanding it helps you anticipate how your converted Markdown will render on the target platform. Different Markdown flavors (CommonMark, GitHub Flavored Markdown, MultiMarkdown, etc.) support different extensions and syntax variations, which means the same Markdown file may render slightly differently across platforms. Testing your converted content on the actual target platform is always recommended.

Libraries like marked.js (JavaScript), markdown-it (JavaScript), and Python-Markdown handle Markdown to HTML conversion. For static site generators, the Markdown-to-HTML conversion is typically built into the build process, and you can configure the parser to enable or disable specific extensions. Understanding both directions of the conversion pipeline — HTML to Markdown and Markdown to HTML — gives you full control over your content's lifecycle across platforms.

Automation Scripts and Batch Processing

For large-scale content migrations, manual conversion is impractical. Automation scripts can process hundreds or thousands of HTML files in minutes, applying consistent conversion rules across all content. A typical batch processing pipeline reads HTML files from a source directory, passes each file through a conversion library (like Turndown.js or Pandoc), applies post-processing rules (like fixing relative URLs, cleaning up whitespace, or adding front-matter), and writes the resulting Markdown files to an output directory with the appropriate naming convention and directory structure.

Post-processing is a critical step that many teams overlook. Automated conversion rarely produces perfect results for every document. Common post-processing tasks include normalizing heading levels, converting absolute URLs to relative ones, extracting metadata into front-matter, removing leftover HTML artifacts, and validating that all images and links still resolve correctly. Building a post-processing step into your automation pipeline ensures consistent quality and reduces the amount of manual cleanup required.

Edge Cases and Best Practices

HTML content often contains elements that do not map cleanly to Markdown, including complex nested structures, CSS-positioned elements, JavaScript-driven content, embedded iframes, and custom components. For these cases, it is often best to preserve the raw HTML within the Markdown document, as most Markdown parsers support inline HTML passthrough. This approach lets you handle the majority of content with Markdown while keeping complex elements in their original HTML form for manual review and refactoring.

Best practices for HTML to Markdown conversion include: always validate the output by rendering the Markdown on the target platform, handle tables and forms separately as they are the most common problem areas, preserve semantic meaning over visual appearance (Markdown handles the styling), use consistent whitespace and formatting conventions, and version your converted content in Git so you can track changes and roll back if issues are discovered later. Taking a methodical approach to conversion ensures your content migration is successful and maintainable.

Key Takeaways

  • HTML to Markdown conversion is essential for CMS migrations, content portability, and version-controlled documentation workflows.
  • Conversion rules map HTML elements to Markdown syntax: headings use hashes, bold uses double asterisks, links use bracket-parenthesis syntax, and lists use hyphens or numbers.
  • Tables, forms, and interactive elements are the most challenging to convert and may require HTML passthrough or manual restructuring.
  • Turndown.js is the best choice for JavaScript-based conversion, Pandoc excels at command-line batch processing, and the Unified ecosystem provides AST-level control for complex pipelines.
  • Automation scripts with post-processing steps are essential for large-scale migrations — always validate output on the target platform.
  • Preserve raw HTML for elements that have no Markdown equivalent, and use consistent formatting conventions throughout your converted content.

Frequently Asked Questions

Can I convert HTML with inline styles to Markdown?

Inline styles on HTML elements have no direct Markdown equivalent because Markdown is a content-first format that delegates styling to the platform or theme. During conversion, inline styles are typically stripped from the output. If the styling conveys semantic meaning (like a highlighted note or a colored alert box), consider converting it to a Markdown blockquote, a callout, or a custom syntax supported by your target platform. Most conversion tools strip style attributes by default, but some allow you to preserve them as HTML passthrough if needed.

How do I handle images during HTML to Markdown conversion?

Images are converted to Markdown syntax ![alt text](src URL), preserving the alt text and source URL from the original HTML. However, you should verify that image URLs are still valid after migration. If your source CMS stored images in a specific directory structure or used absolute URLs, you may need to update the paths to match your new platform's asset management system. Automated post-processing scripts can handle URL rewriting for common migration patterns, saving significant manual effort.

Is lossless conversion from HTML to Markdown possible?

Lossless conversion is not possible in the general case because Markdown supports a subset of HTML's expressiveness. Complex table structures, forms, interactive elements, embedded media, and CSS-driven layouts cannot be fully represented in Markdown. However, for typical article content — headings, paragraphs, lists, links, images, code blocks, and basic tables — the conversion is effectively lossless when using a well-configured conversion tool. For content with complex HTML elements, the recommended approach is to convert what Markdown supports and preserve the rest as inline HTML for manual refinement.

Try the HTML to Markdown Conversion tool

Put what you learned into practice with our free online tool.

Related Guides