pandoc [options] [input-file]…
Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can read markdown and (subsets of) Textile, reStructuredText, HTML, LaTeX, MediaWiki markup, TWiki markup, Haddock markup, OPML, Emacs Org-mode, DocBook, txt2tags, EPUB and Word docx; and it can write plain text, markdown, reStructuredText, XHTML, HTML 5, LaTeX (including beamer slide shows), ConTeXt, RTF, OPML, DocBook, OpenDocument, ODT, Word docx, GNU Texinfo, MediaWiki markup, DokuWiki markup, Haddock markup, EPUB (v2 or v3), FictionBook2, Textile, groff man pages, Emacs Org-Mode, AsciiDoc, InDesign ICML, and Slidy, Slideous, DZSlides, reveal.js or S5 HTML slide shows. It can also produce PDF output on systems where LaTeX is installed.
Pandoc’s enhanced version of markdown includes syntax for footnotes, tables, flexible ordered lists, definition lists, fenced code blocks, superscript, subscript, strikeout, title blocks, automatic tables of contents, embedded LaTeX math, citations, and markdown inside HTML block elements. (These enhancements, described below under Pandoc’s markdown, can be disabled using the markdown_strict
input or output format.)
In contrast to most existing tools for converting markdown to HTML, which use regex substitutions, Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document, and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer.
pandoc
If no input-file is specified, input is read from stdin. Otherwise, the input-files are concatenated (with a blank line between each) and used as input. Output goes to stdout by default (though output to stdout is disabled for the odt
, docx
, epub
, and epub3
output formats). For output to a file, use the -o
option:
pandoc -o output.html input.txt
By default, pandoc produces a document fragment, not a standalone document with a proper header and footer. To produce a standalone document, use the -s
or --standalone
flag:
pandoc -s -o output.html input.txt
For more information on how standalone documents are produced, see Templates, below.
Instead of a file, an absolute URI may be given. In this case pandoc will fetch the content using HTTP:
pandoc -f html -t markdown www.fsf.org
If multiple input files are given, pandoc
will concatenate them all (with blank lines between them) before parsing. This feature is disabled for binary input formats such as EPUB
and docx
.
The format of the input and output can be specified explicitly using command-line options. The input format can be specified using the -r/--read
or -f/--from
options, the output format using the -w/--write
or -t/--to
options. Thus, to convert hello.txt
from markdown to LaTeX, you could type:
pandoc -f markdown -t latex hello.txt
To convert hello.html
from html to markdown:
pandoc -f html -t markdown hello.html
Supported output formats are listed below under the -t/--to
option. Supported input formats are listed below under the -f/--from
option. Note that the rst
, textile
, latex
, and html
readers are not complete; there are some constructs that they do not parse.
If the input or output format is not specified explicitly, pandoc
will attempt to guess it from the extensions of the input and output filenames. Thus, for example,
pandoc -o hello.tex hello.txt
will convert hello.txt
from markdown to LaTeX. If no output file is specified (so that output goes to stdout), or if the output file’s extension is unknown, the output format will default to HTML. If no input file is specified (so that input comes from stdin), or if the input files’ extensions are unknown, the input format will be assumed to be markdown unless explicitly specified.
Pandoc uses the UTF-8 character encoding for both input and output. If your local character encoding is not UTF-8, you should pipe input and output through iconv
:
iconv -t utf-8 input.txt | pandoc | iconv -f utf-8
Note that in some output formats (such as HTML, LaTeX, ConTeXt, RTF, OPML, DocBook, and Texinfo), information about the character encoding is included in the document header, which will only be included if you use the -s/--standalone
option.
Earlier versions of pandoc came with a program, markdown2pdf
, that used pandoc and pdflatex to produce a PDF. This is no longer needed, since pandoc
can now produce pdf
output itself. To produce a PDF, simply specify an output file with a .pdf
extension. Pandoc will create a latex file and use pdflatex (or another engine, see --latex-engine
) to convert it to PDF:
pandoc test.txt -o test.pdf
Production of a PDF requires that a LaTeX engine be installed (see --latex-engine
, below), and assumes that the following LaTeX packages are available: amssymb
, amsmath
, ifxetex
, ifluatex
, listings
(if the --listings
option is used), fancyvrb
, longtable
, booktabs
, url
, graphicx
, hyperref
, ulem
, babel
(if the lang
variable is set), fontspec
(if xelatex
or lualatex
is used as the LaTeX engine), xltxtra
and xunicode
(if xelatex
is used).
hsmarkdown
A user who wants a drop-in replacement for Markdown.pl
may create a symbolic link to the pandoc
executable called hsmarkdown
. When invoked under the name hsmarkdown
, pandoc
will behave as if invoked with -f markdown_strict --email-obfuscation=references
, and all command-line options will be treated as regular arguments. However, this approach does not work under Cygwin, due to problems with its simulation of symbolic links.
-f
FORMAT, -r
FORMAT, --from=
FORMAT, --read=
FORMATnative
(native Haskell), json
(JSON version of native AST), markdown
(pandoc’s extended markdown), markdown_strict
(original unextended markdown), markdown_phpextra
(PHP Markdown Extra extended markdown), markdown_github
(github extended markdown), textile
(Textile), rst
(reStructuredText), html
(HTML), docbook
(DocBook), t2t
(txt2tags), docx
(docx), epub
(EPUB), opml
(OPML), org
(Emacs Org-mode), mediawiki
(MediaWiki markup), twiki
(TWiki markup), haddock
(Haddock markup), or latex
(LaTeX). If +lhs
is appended to markdown
, rst
, latex
, or html
, the input will be treated as literate Haskell source: see Literate Haskell support, below. Markdown syntax extensions can be individually enabled or disabled by appending +EXTENSION
or -EXTENSION
to the format name. So, for example, markdown_strict+footnotes+definition_lists
is strict markdown with footnotes and definition lists enabled, and markdown-pipe_tables+hard_line_breaks
is pandoc’s markdown without pipe tables and with hard line breaks. See Pandoc’s markdown, below, for a list of extensions and their names.
-t
FORMAT, -w
FORMAT, --to=
FORMAT, --write=
FORMATnative
(native Haskell), json
(JSON version of native AST), plain
(plain text), markdown
(pandoc’s extended markdown), markdown_strict
(original unextended markdown), markdown_phpextra
(PHP Markdown extra extended markdown), markdown_github
(github extended markdown), rst
(reStructuredText), html
(XHTML 1), html5
(HTML 5), latex
(LaTeX), beamer
(LaTeX beamer slide show), context
(ConTeXt), man
(groff man), mediawiki
(MediaWiki markup), dokuwiki
(DokuWiki markup), textile
(Textile), org
(Emacs Org-Mode), texinfo
(GNU Texinfo), opml
(OPML), docbook
(DocBook), opendocument
(OpenDocument), odt
(OpenOffice text document), docx
(Word docx), haddock
(Haddock markup), rtf
(rich text format), epub
(EPUB v2 book), epub3
(EPUB v3), fb2
(FictionBook2 e-book), asciidoc
(AsciiDoc), icml
(InDesign ICML), slidy
(Slidy HTML and javascript slide show), slideous
(Slideous HTML and javascript slide show), dzslides
(DZSlides HTML5 + javascript slide show), revealjs
(reveal.js HTML5 + javascript slide show), s5
(S5 HTML and javascript slide show), or the path of a custom lua writer (see Custom writers, below). Note that odt
, epub
, and epub3
output will not be directed to stdout; an output filename must be specified using the -o/--output
option. If +lhs
is appended to markdown
, rst
, latex
, beamer
, html
, or html5
, the output will be rendered as literate Haskell source: see Literate Haskell support, below. Markdown syntax extensions can be individually enabled or disabled by appending +EXTENSION
or -EXTENSION
to the format name, as described above under -f
.
-o
FILE, --output=
FILE-
, output will go to stdout. (Exception: if the output format is odt
, docx
, epub
, or epub3
, output to stdout is disabled.)
--data-dir=
DIRECTORYSpecify the user data directory to search for pandoc data files. If this option is not specified, the default user data directory will be used. This is
$HOME/.pandoc
in unix,
C:\Documents And Settings\USERNAME\Application Data\pandoc
in Windows XP, and
C:\Users\USERNAME\AppData\Roaming\pandoc
in Windows 7. (You can find the default user data directory on your system by looking at the output of pandoc --version
.) A reference.odt
, reference.docx
, default.csl
, epub.css
, templates
, slidy
, slideous
, or s5
directory placed in this directory will override pandoc’s normal defaults.
-v
, --version
-h
, --help
-R
, --parse-raw
-R
is not specified.)
-S
, --smart
---
to em-dashes, --
to en-dashes, and ...
to ellipses. Nonbreaking spaces are inserted after certain abbreviations, such as “Mr.” (Note: This option is significant only when the input format is markdown
, markdown_strict
, textile
or twiki
. It is selected automatically when the input format is textile
or the output format is latex
or context
, unless --no-tex-ligatures
is used.)
--old-dashes
-
before a numeral is an en-dash, and --
is an em-dash. This option is selected automatically for textile
input.
--base-header-level=
NUMBER--indented-code-classes=
CLASSESperl,numberLines
or haskell
. Multiple classes may be separated by spaces or commas.
--default-image-extension=
EXTENSION--filter=
EXECUTABLESpecify an executable to be used as a filter transforming the Pandoc AST after the input is parsed and before the output is written. The executable should read JSON from stdin and write JSON to stdout. The JSON must be formatted like pandoc’s own JSON input and output. The name of the output format will be passed to the filter as the first argument. Hence,
pandoc --filter ./caps.py -t latex
is equivalent to
pandoc -t json | ./caps.py latex | pandoc -f json -t latex
The latter form may be useful for debugging filters.
Filters may be written in any language. Text.Pandoc.JSON
exports toJSONFilter
to facilitate writing filters in Haskell. Those who would prefer to write filters in python can use the module pandocfilters
, installable from PyPI. See github.com/jgm/pandocfilters for the module and several examples. Note that the EXECUTABLE will be sought in the user’s PATH
, and not in the working directory, if no directory is provided. If you want to run a script in the working directory, preface the filename with ./
.
-M
KEY[=VAL], --metadata=
KEY[:VAL]--variable
, --metadata
causes template variables to be set. But unlike --variable
, --metadata
affects the metadata of the underlying document (which is accessible from filters and may be printed in some output formats).
--normalize
Str
or Emph
elements, for example, and remove repeated Space
s.
-p
, --preserve-tabs
--tab-stop=
NUMBER--track-changes=
accept|reject|allinsertion
and deletion
classes, respectively. The author and time of change is included. all is useful for scripting: only accepting changes from a certain reviewer, say, or before a certain date. This option only affects the docx reader.
--extract-media=
DIR-s
, --standalone
pdf
, epub
, epub3
, fb2
, docx
, and odt
output.
--template=
FILE--standalone
. See Templates below for a description of template syntax. If no extension is specified, an extension corresponding to the writer will be added, so that --template=special
looks for special.html
for HTML output. If the template is not found, pandoc will search for it in the user data directory (see --data-dir
). If this option is not used, a default template appropriate for the output format will be used (see -D/--print-default-template
).
-V
KEY[=VAL], --variable=
KEY[:VAL]--template
option is used to specify a custom template, since pandoc automatically sets the variables used in the default templates. If no VAL is specified, the key will be given the value true
.
-D
FORMAT, --print-default-template=
FORMAT-t
for a list of possible FORMATs.)
--print-default-data-file=
FILE--no-wrap
--columns
=NUMBER--toc
, --table-of-contents
latex
, context
, and rst
, an instruction to create one) in the output document. This option has no effect on man
, docbook
, slidy
, slideous
, s5
, docx
, or odt
output.
--toc-depth=
NUMBER--no-highlight
--highlight-style
=STYLEpygments
(the default), kate
, monochrome
, espresso
, zenburn
, haddock
, and tango
.
-H
FILE, --include-in-header=
FILE--standalone
.
-B
FILE, --include-before-body=
FILE<body>
tag in HTML, or the \begin{document}
command in LaTeX). This can be used to include navigation bars or banners in HTML documents. This option can be used repeatedly to include multiple files. They will be included in the order specified. Implies --standalone
.
-A
FILE, --include-after-body=
FILE</body>
tag in HTML, or the \end{document}
command in LaTeX). This option can be be used repeatedly to include multiple files. They will be included in the order specified. Implies --standalone
.
--self-contained
data:
URIs to incorporate the contents of linked scripts, stylesheets, images, and videos. The resulting file should be “self-contained,” in the sense that it needs no external files and no net access to be displayed properly by a browser. This option works only with HTML output formats, including html
, html5
, html+lhs
, html5+lhs
, s5
, slidy
, slideous
, dzslides
, and revealjs
. Scripts, images, and stylesheets at absolute URLs will be downloaded; those at relative URLs will be sought relative to the working directory (if the first source file is local) or relative to the base URL (if the first source file is remote). --self-contained
does not work with --mathjax
.
--offline
--self-contained
.
-5
, --html5
html
. (Deprecated: Use the html5
output format instead.)
--html-q-tags
<q>
tags for quotes in HTML.
--ascii
--reference-links
--atx-headers
--chapters
beamer
is the output format, top-level headers will become \part{..}
.
-N
, --number-sections
unnumbered
will never be numbered, even if --number-sections
is specified.
--number-offset
=NUMBER[,NUMBER,…],--number-offset=5
. If your document starts with a level-2 header which you want to be numbered “1.5”, specify --number-offset=1,4
. Offsets are 0 by default. Implies --number-sections
.
--no-tex-ligatures
--smart
is selected automatically for LaTeX and ConTeXt output, but it must be specified explicitly if --no-tex-ligatures
is selected. If you use literal curly quotes, dashes, and ellipses in your source, then you may want to use --no-tex-ligatures
without --smart
.
--listings
-i
, --incremental
--slide-level
=NUMBERbeamer
, s5
, slidy
, slideous
, dzslides
). Headers above this level in the hierarchy are used to divide the slide show into sections; headers below this level create subheads within a slide. The default is to set the slide level based on the contents of the document; see Structuring the slide show, below.
--section-divs
<div>
tags (or <section>
tags in HTML5), and attach identifiers to the enclosing <div>
(or <section>
) rather than the header itself. See Section identifiers, below.
--email-obfuscation=
none|javascript|referencesmailto:
links in HTML documents. none leaves mailto:
links as they are. javascript obfuscates them using javascript. references obfuscates them by printing their letters as decimal or hexadecimal character references.
--id-prefix
=STRING-T
STRING, --title-prefix=
STRING--standalone
.
-c
URL, --css=
URL--reference-odt=
FILEreference.odt
in the user data directory (see --data-dir
). If this is not found either, sensible defaults will be used.
--reference-docx=
FILEreference.docx
in the user data directory (see --data-dir
). If this is not found either, sensible defaults will be used. The following styles are used by pandoc: [paragraph] Normal, Compact, Title, Subtitle, Authors, Date, Abstract, Heading 1, Heading 2, Heading 3, Heading 4, Heading 5, Block Quote, Definition Term, Definition, Bibliography, Body Text, Table Caption, Image Caption; [character] Default Paragraph Font, Body Text Char, Verbatim Char, Footnote Ref, Link.
--epub-stylesheet=
FILEepub.css
in the user data directory (see --data-dir
). If it is not found there, sensible defaults will be used.
--epub-cover-image=
FILEcover-image
in a YAML metadata block (see EPUB Metadata, below).
--epub-metadata=
FILELook in the specified XML file for metadata for the EPUB. The file should contain a series of Dublin Core elements, as documented at dublincore.org/documents/dces/. For example:
<dc:rights>Creative Commons</dc:rights>
<dc:language>es-AR</dc:language>
By default, pandoc will include the following metadata elements: <dc:title>
(from the document title), <dc:creator>
(from the document authors), <dc:date>
(from the document date, which should be in ISO 8601 format), <dc:language>
(from the lang
variable, or, if is not set, the locale), and <dc:identifier id="BookId">
(a randomly generated UUID). Any of these may be overridden by elements in the metadata file.
Note: if the source document is markdown, a YAML metadata block in the document can be used instead. See below under EPUB Metadata.
--epub-embed-font=
FILEEmbed the specified font in the EPUB. This option can be repeated to embed multiple fonts. To use embedded fonts, you will need to add declarations like the following to your CSS (see --epub-stylesheet
):
@font-face {
font-family: DejaVuSans;
font-style: normal;
font-weight: normal;
src:url("DejaVuSans-Regular.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: normal;
font-weight: bold;
src:url("DejaVuSans-Bold.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: italic;
font-weight: normal;
src:url("DejaVuSans-Oblique.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: italic;
font-weight: bold;
src:url("DejaVuSans-BoldOblique.ttf");
}
body { font-family: "DejaVuSans"; }
--epub-chapter-level=
NUMBER--latex-engine=
pdflatex|lualatex|xelatexpdflatex
. If the engine is not in your PATH, the full path of the engine may be specified here.
--bibliography=
FILEbibliography
field in the document’s metadata to FILE, overriding any value set in the metadata, and process citations using pandoc-citeproc
. (This is equivalent to --metadata bibliography=FILE --filter pandoc-citeproc
.)
--csl=
FILEcsl
field in the document’s metadata to FILE, overriding any value set in the metadata. (This is equivalent to --metadata csl=FILE
.)
--citation-abbreviations=
FILEcitation-abbreviations
field in the document’s metadata to FILE, overriding any value set in the metadata. (This is equivalent to --metadata citation-abbreviations=FILE
.)
--natbib
pandoc-citeproc
filter or with PDF output. It is intended for use in producing a LaTeX file that can be processed with pdflatex and bibtex.
--biblatex
pandoc-citeproc
filter or with PDF output. It is intended for use in producing a LaTeX file that can be processed with pdflatex and bibtex or biber.
-m
[URL], --latexmathml
[=URL]LaTeXMathML.js
script, provide a URL. If no URL is provided, the contents of the script will be inserted directly into the HTML header, preserving portability at the price of efficiency. If you plan to use math on several pages, it is much better to link to a copy of the script, so it can be cached.
--mathml
[=URL]docbook
as well as html
and html5
). In standalone html
output, a small javascript (or a link to such a script if a URL is supplied) will be inserted that allows the MathML to be viewed on some browsers.
--jsmath
[=URL]jsMath/easy/load.js
); if provided, it will be linked to in the header of standalone HTML documents. If a URL is not provided, no link to the jsMath load script will be inserted; it is then up to the author to provide such a link in the HTML template.
--mathjax
[=URL]MathJax.js
load script. If a URL is not provided, a link to the MathJax CDN will be inserted.
--gladtex
<eq>
tags in HTML output. These can then be processed by gladTeX to produce links to images of the typeset formulas.
--mimetex
[=URL]/cgi-bin/mimetex.cgi
.
--webtex
[=URL]--katex
[=URL] : Use KaTeX to display embedded TeX math in HTML output. The URL should point to the katex.js
load script. If a URL is not provided, a link to the KaTeX CDN will be inserted.
--katex-stylesheet=*URL*
: The URL should point to the katex.css
stylesheet. If this option is not specified, a link to the KaTeX CDN will be inserted. Note that this option does not imply --katex
.
--dump-args
-o
option, or -
(for stdout) if no output file was specified. The remaining lines contain the command-line arguments, one per line, in the order they appear. These do not include regular Pandoc options and their arguments, but do include any options appearing after a --
separator at the end of the line.
--ignore-args
Ignore command-line arguments (for use in wrapper scripts). Regular Pandoc options are not ignored. Thus, for example,
pandoc --ignore-args -o foo.html -s foo.txt -- -e latin1
is equivalent to
pandoc -o foo.html -s
When the -s/--standalone
option is used, pandoc uses a template to add header and footer material that is needed for a self-standing document. To see the default template that is used, just type
pandoc -D FORMAT
where FORMAT
is the name of the output format. A custom template can be specified using the --template
option. You can also override the system default templates for a given output format FORMAT
by putting a file templates/default.FORMAT
in the user data directory (see --data-dir
, above). Exceptions: For odt
output, customize the default.opendocument
template. For pdf
output, customize the default.latex
template.
Templates may contain variables. Variable names are sequences of alphanumerics, -
, and _
, starting with a letter. A variable name surrounded by $
signs will be replaced by its value. For example, the string $title$
in
<title>$title$</title>
will be replaced by the document title.
To write a literal $
in a template, use $$
.
Some variables are set automatically by pandoc. These vary somewhat depending on the output format, but include metadata fields (such as title
, author
, and date
) as well as the following:
header-includes
-H/--include-in-header
(may have multiple values)
toc
--toc/--table-of-contents
was specified
include-before
-B/--include-before-body
(may have multiple values)
include-after
-A/--include-after-body
(may have multiple values)
body
lang
slidy-url
www.w3.org/Talks/Tools/Slidy2
)
slideous-url
slideous
)
s5-url
s5/default
)
revealjs-url