Edit on GitHub

pdoc.markdown2

A fast and complete Python implementation of Markdown.

[from http://daringfireball.net/projects/markdown/]

Markdown is a text-to-HTML filter; it translates an easy-to-read / easy-to-write structured text format into HTML. Markdown's text format is most similar to that of plain text email, and supports features such as headers, emphasis, code blocks, blockquotes, and links.

Markdown's syntax is designed not as a generic markup language, but specifically to serve as a front-end to (X)HTML. You can use span-level HTML tags anywhere in a Markdown document, and you can use block level HTML tags (like

and as well).

Module usage:

>>> import markdown2
>>> markdown2.markdown("*boo!*")  # or use `html = markdown_path(PATH)`
u'<p><em>boo!</em></p>\n'

>>> markdowner = Markdown()
>>> markdowner.convert("*boo!*")
u'<p><em>boo!</em></p>\n'
>>> markdowner.convert("**boom!**")
u'<p><strong>boom!</strong></p>\n'

This implementation of Markdown implements the full "core" syntax plus a number of extras (e.g., code syntax coloring, footnotes) as described on https://github.com/trentm/python-markdown2/wiki/Extras.

   1# fmt: off
   2# flake8: noqa
   3# type: ignore
   4# Taken from here: https://github.com/trentm/python-markdown2/tree/6269c1f5f5e812f85ffb8524b8bf10b615579abf
   5
   6#!/usr/bin/env python
   7# Copyright (c) 2012 Trent Mick.
   8# Copyright (c) 2007-2008 ActiveState Corp.
   9# License: MIT (http://www.opensource.org/licenses/mit-license.php)
  10
  11r"""A fast and complete Python implementation of Markdown.
  12
  13[from http://daringfireball.net/projects/markdown/]
  14> Markdown is a text-to-HTML filter; it translates an easy-to-read /
  15> easy-to-write structured text format into HTML.  Markdown's text
  16> format is most similar to that of plain text email, and supports
  17> features such as headers, *emphasis*, code blocks, blockquotes, and
  18> links.
  19>
  20> Markdown's syntax is designed not as a generic markup language, but
  21> specifically to serve as a front-end to (X)HTML. You can use span-level
  22> HTML tags anywhere in a Markdown document, and you can use block level
  23> HTML tags (like <div> and <table> as well).
  24
  25Module usage:
  26
  27    >>> import markdown2
  28    >>> markdown2.markdown("*boo!*")  # or use `html = markdown_path(PATH)`
  29    u'<p><em>boo!</em></p>\n'
  30
  31    >>> markdowner = Markdown()
  32    >>> markdowner.convert("*boo!*")
  33    u'<p><em>boo!</em></p>\n'
  34    >>> markdowner.convert("**boom!**")
  35    u'<p><strong>boom!</strong></p>\n'
  36
  37This implementation of Markdown implements the full "core" syntax plus a
  38number of extras (e.g., code syntax coloring, footnotes) as described on
  39<https://github.com/trentm/python-markdown2/wiki/Extras>.
  40"""
  41
  42cmdln_desc = """A fast and complete Python implementation of Markdown, a
  43text-to-HTML conversion tool for web writers.
  44
  45Supported extra syntax options (see -x|--extras option below and
  46see <https://github.com/trentm/python-markdown2/wiki/Extras> for details):
  47
  48* break-on-newline: Replace single new line characters with <br> when True
  49* code-friendly: Disable _ and __ for em and strong.
  50* cuddled-lists: Allow lists to be cuddled to the preceding paragraph.
  51* fenced-code-blocks: Allows a code block to not have to be indented
  52  by fencing it with '```' on a line before and after. Based on
  53  <http://github.github.com/github-flavored-markdown/> with support for
  54  syntax highlighting.
  55* footnotes: Support footnotes as in use on daringfireball.net and
  56  implemented in other Markdown processors (tho not in Markdown.pl v1.0.1).
  57* header-ids: Adds "id" attributes to headers. The id value is a slug of
  58  the header text.
  59* highlightjs-lang: Allows specifying the language which used for syntax
  60  highlighting when using fenced-code-blocks and highlightjs.
  61* html-classes: Takes a dict mapping html tag names (lowercase) to a
  62  string to use for a "class" tag attribute. Currently only supports "img",
  63  "table", "pre" and "code" tags. Add an issue if you require this for other
  64  tags.
  65* link-patterns: Auto-link given regex patterns in text (e.g. bug number
  66  references, revision number references).
  67* markdown-in-html: Allow the use of `markdown="1"` in a block HTML tag to
  68  have markdown processing be done on its contents. Similar to
  69  <http://michelf.com/projects/php-markdown/extra/#markdown-attr> but with
  70  some limitations.
  71* metadata: Extract metadata from a leading '---'-fenced block.
  72  See <https://github.com/trentm/python-markdown2/issues/77> for details.
  73* nofollow: Add `rel="nofollow"` to add `<a>` tags with an href. See
  74  <http://en.wikipedia.org/wiki/Nofollow>.
  75* numbering: Support of generic counters.  Non standard extension to
  76  allow sequential numbering of figures, tables, equations, exhibits etc.
  77* pyshell: Treats unindented Python interactive shell sessions as <code>
  78  blocks.
  79* smarty-pants: Replaces ' and " with curly quotation marks or curly
  80  apostrophes.  Replaces --, ---, ..., and . . . with en dashes, em dashes,
  81  and ellipses.
  82* spoiler: A special kind of blockquote commonly hidden behind a
  83  click on SO. Syntax per <http://meta.stackexchange.com/a/72878>.
  84* strike: text inside of double tilde is ~~strikethrough~~
  85* tag-friendly: Requires atx style headers to have a space between the # and
  86  the header text. Useful for applications that require twitter style tags to
  87  pass through the parser.
  88* tables: Tables using the same format as GFM
  89  <https://help.github.com/articles/github-flavored-markdown#tables> and
  90  PHP-Markdown Extra <https://michelf.ca/projects/php-markdown/extra/#table>.
  91* toc: The returned HTML string gets a new "toc_html" attribute which is
  92  a Table of Contents for the document. (experimental)
  93* use-file-vars: Look for an Emacs-style markdown-extras file variable to turn
  94  on Extras.
  95* wiki-tables: Google Code Wiki-style tables. See
  96  <http://code.google.com/p/support/wiki/WikiSyntax#Tables>.
  97* xml: Passes one-liner processing instructions and namespaced XML tags.
  98"""
  99
 100# Dev Notes:
 101# - Python's regex syntax doesn't have '\z', so I'm using '\Z'. I'm
 102#   not yet sure if there implications with this. Compare 'pydoc sre'
 103#   and 'perldoc perlre'.
 104
 105__version_info__ = (2, 4, 3)
 106__version__ = '.'.join(map(str, __version_info__))
 107__author__ = "Trent Mick"
 108
 109import sys
 110import re
 111import logging
 112from hashlib import sha256
 113import optparse
 114from random import random, randint
 115import codecs
 116from collections import defaultdict
 117
 118
 119# ---- Python version compat
 120
 121# Use `bytes` for byte strings and `unicode` for unicode strings (str in Py3).
 122if sys.version_info[0] <= 2:
 123    py3 = False
 124    try:
 125        bytes
 126    except NameError:
 127        bytes = str
 128    base_string_type = basestring
 129elif sys.version_info[0] >= 3:
 130    py3 = True
 131    unicode = str
 132    base_string_type = str
 133
 134# ---- globals
 135
 136DEBUG = False
 137log = logging.getLogger("markdown")
 138
 139DEFAULT_TAB_WIDTH = 4
 140
 141
 142SECRET_SALT = bytes(randint(0, 1000000))
 143# MD5 function was previously used for this; the "md5" prefix was kept for
 144# backwards compatibility.
 145def _hash_text(s):
 146    return 'md5-' + sha256(SECRET_SALT + s.encode("utf-8")).hexdigest()[32:]
 147
 148# Table of hash values for escaped characters:
 149g_escape_table = dict([(ch, _hash_text(ch))
 150    for ch in '\\`*_{}[]()>#+-.!'])
 151
 152# Ampersand-encoding based entirely on Nat Irons's Amputator MT plugin:
 153#   http://bumppo.net/projects/amputator/
 154_AMPERSAND_RE = re.compile(r'&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)')
 155
 156
 157# ---- exceptions
 158class MarkdownError(Exception):
 159    pass
 160
 161
 162# ---- public api
 163
 164def markdown_path(path, encoding="utf-8",
 165                  html4tags=False, tab_width=DEFAULT_TAB_WIDTH,
 166                  safe_mode=None, extras=None, link_patterns=None,
 167                  footnote_title=None, footnote_return_symbol=None,
 168                  use_file_vars=False):
 169    fp = codecs.open(path, 'r', encoding)
 170    text = fp.read()
 171    fp.close()
 172    return Markdown(html4tags=html4tags, tab_width=tab_width,
 173                    safe_mode=safe_mode, extras=extras,
 174                    link_patterns=link_patterns,
 175                    footnote_title=footnote_title,
 176                    footnote_return_symbol=footnote_return_symbol,
 177                    use_file_vars=use_file_vars).convert(text)
 178
 179
 180def markdown(text, html4tags=False, tab_width=DEFAULT_TAB_WIDTH,
 181             safe_mode=None, extras=None, link_patterns=None,
 182             footnote_title=None, footnote_return_symbol=None,
 183             use_file_vars=False, cli=False):
 184    return Markdown(html4tags=html4tags, tab_width=tab_width,
 185                    safe_mode=safe_mode, extras=extras,
 186                    link_patterns=link_patterns,
 187                    footnote_title=footnote_title,
 188                    footnote_return_symbol=footnote_return_symbol,
 189                    use_file_vars=use_file_vars, cli=cli).convert(text)
 190
 191
 192class Markdown(object):
 193    # The dict of "extras" to enable in processing -- a mapping of
 194    # extra name to argument for the extra. Most extras do not have an
 195    # argument, in which case the value is None.
 196    #
 197    # This can be set via (a) subclassing and (b) the constructor
 198    # "extras" argument.
 199    extras = None
 200
 201    urls = None
 202    titles = None
 203    html_blocks = None
 204    html_spans = None
 205    html_removed_text = "{(#HTML#)}"  # placeholder removed text that does not trigger bold
 206    html_removed_text_compat = "[HTML_REMOVED]"  # for compat with markdown.py
 207
 208    _toc = None
 209
 210    # Used to track when we're inside an ordered or unordered list
 211    # (see _ProcessListItems() for details):
 212    list_level = 0
 213
 214    _ws_only_line_re = re.compile(r"^[ \t]+$", re.M)
 215
 216    def __init__(self, html4tags=False, tab_width=4, safe_mode=None,
 217                 extras=None, link_patterns=None,
 218                 footnote_title=None, footnote_return_symbol=None,
 219                 use_file_vars=False, cli=False):
 220        if html4tags:
 221            self.empty_element_suffix = ">"
 222        else:
 223            self.empty_element_suffix = " />"
 224        self.tab_width = tab_width
 225        self.tab = tab_width * " "
 226
 227        # For compatibility with earlier markdown2.py and with
 228        # markdown.py's safe_mode being a boolean,
 229        #   safe_mode == True -> "replace"
 230        if safe_mode is True:
 231            self.safe_mode = "replace"
 232        else:
 233            self.safe_mode = safe_mode
 234
 235        # Massaging and building the "extras" info.
 236        if self.extras is None:
 237            self.extras = {}
 238        elif not isinstance(self.extras, dict):
 239            self.extras = dict([(e, None) for e in self.extras])
 240        if extras:
 241            if not isinstance(extras, dict):
 242                extras = dict([(e, None) for e in extras])
 243            self.extras.update(extras)
 244        assert isinstance(self.extras, dict)
 245
 246        if "toc" in self.extras:
 247            if "header-ids" not in self.extras:
 248                self.extras["header-ids"] = None   # "toc" implies "header-ids"
 249
 250            if self.extras["toc"] is None:
 251                self._toc_depth = 6
 252            else:
 253                self._toc_depth = self.extras["toc"].get("depth", 6)
 254        self._instance_extras = self.extras.copy()
 255
 256        self.link_patterns = link_patterns
 257        self.footnote_title = footnote_title
 258        self.footnote_return_symbol = footnote_return_symbol
 259        self.use_file_vars = use_file_vars
 260        self._outdent_re = re.compile(r'^(\t|[ ]{1,%d})' % tab_width, re.M)
 261        self.cli = cli
 262
 263        self._escape_table = g_escape_table.copy()
 264        self._code_table = {}
 265        if "smarty-pants" in self.extras:
 266            self._escape_table['"'] = _hash_text('"')
 267            self._escape_table["'"] = _hash_text("'")
 268
 269    def reset(self):
 270        self.urls = {}
 271        self.titles = {}
 272        self.html_blocks = {}
 273        self.html_spans = {}
 274        self.list_level = 0
 275        self.extras = self._instance_extras.copy()
 276        if "footnotes" in self.extras:
 277            self.footnotes = {}
 278            self.footnote_ids = []
 279        if "header-ids" in self.extras:
 280            self._count_from_header_id = defaultdict(int)
 281        if "metadata" in self.extras:
 282            self.metadata = {}
 283        self._toc = None
 284
 285    # Per <https://developer.mozilla.org/en-US/docs/HTML/Element/a> "rel"
 286    # should only be used in <a> tags with an "href" attribute.
 287
 288    # Opens the linked document in a new window or tab
 289    # should only used in <a> tags with an "href" attribute.
 290    # same with _a_nofollow
 291    _a_nofollow_or_blank_links = re.compile(r"""
 292        <(a)
 293        (
 294            [^>]*
 295            href=   # href is required
 296            ['"]?   # HTML5 attribute values do not have to be quoted
 297            [^#'"]  # We don't want to match href values that start with # (like footnotes)
 298        )
 299        """,
 300        re.IGNORECASE | re.VERBOSE
 301    )
 302
 303    def convert(self, text):
 304        """Convert the given text."""
 305        # Main function. The order in which other subs are called here is
 306        # essential. Link and image substitutions need to happen before
 307        # _EscapeSpecialChars(), so that any *'s or _'s in the <a>
 308        # and <img> tags get encoded.
 309
 310        # Clear the global hashes. If we don't clear these, you get conflicts
 311        # from other articles when generating a page which contains more than
 312        # one article (e.g. an index page that shows the N most recent
 313        # articles):
 314        self.reset()
 315
 316        if not isinstance(text, unicode):
 317            # TODO: perhaps shouldn't presume UTF-8 for string input?
 318            text = unicode(text, 'utf-8')
 319
 320        if self.use_file_vars:
 321            # Look for emacs-style file variable hints.
 322            emacs_vars = self._get_emacs_vars(text)
 323            if "markdown-extras" in emacs_vars:
 324                splitter = re.compile("[ ,]+")
 325                for e in splitter.split(emacs_vars["markdown-extras"]):
 326                    if '=' in e:
 327                        ename, earg = e.split('=', 1)
 328                        try:
 329                            earg = int(earg)
 330                        except ValueError:
 331                            pass
 332                    else:
 333                        ename, earg = e, None
 334                    self.extras[ename] = earg
 335
 336        # Standardize line endings:
 337        text = text.replace("\r\n", "\n")
 338        text = text.replace("\r", "\n")
 339
 340        # Make sure $text ends with a couple of newlines:
 341        text += "\n\n"
 342
 343        # Convert all tabs to spaces.
 344        text = self._detab(text)
 345
 346        # Strip any lines consisting only of spaces and tabs.
 347        # This makes subsequent regexen easier to write, because we can
 348        # match consecutive blank lines with /\n+/ instead of something
 349        # contorted like /[ \t]*\n+/ .
 350        text = self._ws_only_line_re.sub("", text)
 351
 352        # strip metadata from head and extract
 353        if "metadata" in self.extras:
 354            text = self._extract_metadata(text)
 355
 356        text = self.preprocess(text)
 357
 358        if "fenced-code-blocks" in self.extras and not self.safe_mode:
 359            text = self._do_fenced_code_blocks(text)
 360
 361        if self.safe_mode:
 362            text = self._hash_html_spans(text)
 363
 364        # Turn block-level HTML blocks into hash entries
 365        text = self._hash_html_blocks(text, raw=True)
 366
 367        if "fenced-code-blocks" in self.extras and self.safe_mode:
 368            text = self._do_fenced_code_blocks(text)
 369
 370        # Because numbering references aren't links (yet?) then we can do everything associated with counters
 371        # before we get started
 372        if "numbering" in self.extras:
 373            text = self._do_numbering(text)
 374
 375        # Strip link definitions, store in hashes.
 376        if "footnotes" in self.extras:
 377            # Must do footnotes first because an unlucky footnote defn
 378            # looks like a link defn:
 379            #   [^4]: this "looks like a link defn"
 380            text = self._strip_footnote_definitions(text)
 381        text = self._strip_link_definitions(text)
 382
 383        text = self._run_block_gamut(text)
 384
 385        if "footnotes" in self.extras:
 386            text = self._add_footnotes(text)
 387
 388        text = self.postprocess(text)
 389
 390        text = self._unescape_special_chars(text)
 391
 392        if self.safe_mode:
 393            text = self._unhash_html_spans(text)
 394            # return the removed text warning to its markdown.py compatible form
 395            text = text.replace(self.html_removed_text, self.html_removed_text_compat)
 396
 397        do_target_blank_links = "target-blank-links" in self.extras
 398        do_nofollow_links = "nofollow" in self.extras
 399
 400        if do_target_blank_links and do_nofollow_links:
 401            text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="nofollow noopener" target="_blank"\2', text)
 402        elif do_target_blank_links:
 403            text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="noopener" target="_blank"\2', text)
 404        elif do_nofollow_links:
 405            text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="nofollow"\2', text)
 406
 407        if "toc" in self.extras and self._toc:
 408            self._toc_html = calculate_toc_html(self._toc)
 409
 410            # Prepend toc html to output
 411            if self.cli:
 412                text = '{}\n{}'.format(self._toc_html, text)
 413
 414        text += "\n"
 415
 416        # Attach attrs to output
 417        rv = UnicodeWithAttrs(text)
 418
 419        if "toc" in self.extras and self._toc:
 420            rv.toc_html = self._toc_html
 421
 422        if "metadata" in self.extras:
 423            rv.metadata = self.metadata
 424        return rv
 425
 426    def postprocess(self, text):
 427        """A hook for subclasses to do some postprocessing of the html, if
 428        desired. This is called before unescaping of special chars and
 429        unhashing of raw HTML spans.
 430        """
 431        return text
 432
 433    def preprocess(self, text):
 434        """A hook for subclasses to do some preprocessing of the Markdown, if
 435        desired. This is called after basic formatting of the text, but prior
 436        to any extras, safe mode, etc. processing.
 437        """
 438        return text
 439
 440    # Is metadata if the content starts with optional '---'-fenced `key: value`
 441    # pairs. E.g. (indented for presentation):
 442    #   ---
 443    #   foo: bar
 444    #   another-var: blah blah
 445    #   ---
 446    #   # header
 447    # or:
 448    #   foo: bar
 449    #   another-var: blah blah
 450    #
 451    #   # header
 452    _meta_data_pattern = re.compile(r'^(?:---[\ \t]*\n)?((?:[\S\w]+\s*:(?:\n+[ \t]+.*)+)|(?:.*:\s+>\n\s+[\S\s]+?)(?=\n\w+\s*:\s*\w+\n|\Z)|(?:\s*[\S\w]+\s*:(?! >)[ \t]*.*\n?))(?:---[\ \t]*\n)?', re.MULTILINE)
 453    _key_val_pat = re.compile(r"[\S\w]+\s*:(?! >)[ \t]*.*\n?", re.MULTILINE)
 454    # this allows key: >
 455    #                   value
 456    #                   conutiues over multiple lines
 457    _key_val_block_pat = re.compile(
 458        r"(.*:\s+>\n\s+[\S\s]+?)(?=\n\w+\s*:\s*\w+\n|\Z)", re.MULTILINE
 459    )
 460    _key_val_list_pat = re.compile(
 461        r"^-(?:[ \t]*([^\n]*)(?:[ \t]*[:-][ \t]*(\S+))?)(?:\n((?:[ \t]+[^\n]+\n?)+))?",
 462        re.MULTILINE,
 463    )
 464    _key_val_dict_pat = re.compile(
 465        r"^([^:\n]+)[ \t]*:[ \t]*([^\n]*)(?:((?:\n[ \t]+[^\n]+)+))?", re.MULTILINE
 466    )  # grp0: key, grp1: value, grp2: multiline value
 467    _meta_data_fence_pattern = re.compile(r'^---[\ \t]*\n', re.MULTILINE)
 468    _meta_data_newline = re.compile("^\n", re.MULTILINE)
 469
 470    def _extract_metadata(self, text):
 471        if text.startswith("---"):
 472            fence_splits = re.split(self._meta_data_fence_pattern, text, maxsplit=2)
 473            metadata_content = fence_splits[1]
 474            match = re.findall(self._meta_data_pattern, metadata_content)
 475            if not match:
 476                return text
 477            tail = fence_splits[2]
 478        else:
 479            metadata_split = re.split(self._meta_data_newline, text, maxsplit=1)
 480            metadata_content = metadata_split[0]
 481            match = re.findall(self._meta_data_pattern, metadata_content)
 482            if not match:
 483                return text
 484            tail = metadata_split[1]
 485
 486        def parse_structured_value(value):
 487            vs = value.lstrip()
 488            vs = value.replace(v[: len(value) - len(vs)], "\n")[1:]
 489
 490            # List
 491            if vs.startswith("-"):
 492                r = []
 493                for match in re.findall(self._key_val_list_pat, vs):
 494                    if match[0] and not match[1] and not match[2]:
 495                        r.append(match[0].strip())
 496                    elif match[0] == ">" and not match[1] and match[2]:
 497                        r.append(match[2].strip())
 498                    elif match[0] and match[1]:
 499                        r.append({match[0].strip(): match[1].strip()})
 500                    elif not match[0] and not match[1] and match[2]:
 501                        r.append(parse_structured_value(match[2]))
 502                    else:
 503                        # Broken case
 504                        pass
 505
 506                return r
 507
 508            # Dict
 509            else:
 510                return {
 511                    match[0].strip(): (
 512                        match[1].strip()
 513                        if match[1]
 514                        else parse_structured_value(match[2])
 515                    )
 516                    for match in re.findall(self._key_val_dict_pat, vs)
 517                }
 518
 519        for item in match:
 520
 521            k, v = item.split(":", 1)
 522
 523            # Multiline value
 524            if v[:3] == " >\n":
 525                self.metadata[k.strip()] = _dedent(v[3:]).strip()
 526
 527            # Empty value
 528            elif v == "\n":
 529                self.metadata[k.strip()] = ""
 530
 531            # Structured value
 532            elif v[0] == "\n":
 533                self.metadata[k.strip()] = parse_structured_value(v)
 534
 535            # Simple value
 536            else:
 537                self.metadata[k.strip()] = v.strip()
 538
 539        return tail
 540
 541    _emacs_oneliner_vars_pat = re.compile(r"-\*-\s*(?:(\S[^\r\n]*?)([\r\n]\s*)?)?-\*-", re.UNICODE)
 542    # This regular expression is intended to match blocks like this:
 543    #    PREFIX Local Variables: SUFFIX
 544    #    PREFIX mode: Tcl SUFFIX
 545    #    PREFIX End: SUFFIX
 546    # Some notes:
 547    # - "[ \t]" is used instead of "\s" to specifically exclude newlines
 548    # - "(\r\n|\n|\r)" is used instead of "$" because the sre engine does
 549    #   not like anything other than Unix-style line terminators.
 550    _emacs_local_vars_pat = re.compile(r"""^
 551        (?P<prefix>(?:[^\r\n|\n|\r])*?)
 552        [\ \t]*Local\ Variables:[\ \t]*
 553        (?P<suffix>.*?)(?:\r\n|\n|\r)
 554        (?P<content>.*?\1End:)
 555        """, re.IGNORECASE | re.MULTILINE | re.DOTALL | re.VERBOSE)
 556
 557    def _get_emacs_vars(self, text):
 558        """Return a dictionary of emacs-style local variables.
 559
 560        Parsing is done loosely according to this spec (and according to
 561        some in-practice deviations from this):
 562        http://www.gnu.org/software/emacs/manual/html_node/emacs/Specifying-File-Variables.html#Specifying-File-Variables
 563        """
 564        emacs_vars = {}
 565        SIZE = pow(2, 13)  # 8kB
 566
 567        # Search near the start for a '-*-'-style one-liner of variables.
 568        head = text[:SIZE]
 569        if "-*-" in head:
 570            match = self._emacs_oneliner_vars_pat.search(head)
 571            if match:
 572                emacs_vars_str = match.group(1)
 573                assert '\n' not in emacs_vars_str
 574                emacs_var_strs = [s.strip() for s in emacs_vars_str.split(';')
 575                                  if s.strip()]
 576                if len(emacs_var_strs) == 1 and ':' not in emacs_var_strs[0]:
 577                    # While not in the spec, this form is allowed by emacs:
 578                    #   -*- Tcl -*-
 579                    # where the implied "variable" is "mode". This form
 580                    # is only allowed if there are no other variables.
 581                    emacs_vars["mode"] = emacs_var_strs[0].strip()
 582                else:
 583                    for emacs_var_str in emacs_var_strs:
 584                        try:
 585                            variable, value = emacs_var_str.strip().split(':', 1)
 586                        except ValueError:
 587                            log.debug("emacs variables error: malformed -*- "
 588                                      "line: %r", emacs_var_str)
 589                            continue
 590                        # Lowercase the variable name because Emacs allows "Mode"
 591                        # or "mode" or "MoDe", etc.
 592                        emacs_vars[variable.lower()] = value.strip()
 593
 594        tail = text[-SIZE:]
 595        if "Local Variables" in tail:
 596            match = self._emacs_local_vars_pat.search(tail)
 597            if match:
 598                prefix = match.group("prefix")
 599                suffix = match.group("suffix")
 600                lines = match.group("content").splitlines(0)
 601                # print "prefix=%r, suffix=%r, content=%r, lines: %s"\
 602                #      % (prefix, suffix, match.group("content"), lines)
 603
 604                # Validate the Local Variables block: proper prefix and suffix
 605                # usage.
 606                for i, line in enumerate(lines):
 607                    if not line.startswith(prefix):
 608                        log.debug("emacs variables error: line '%s' "
 609                                  "does not use proper prefix '%s'"
 610                                  % (line, prefix))
 611                        return {}
 612                    # Don't validate suffix on last line. Emacs doesn't care,
 613                    # neither should we.
 614                    if i != len(lines)-1 and not line.endswith(suffix):
 615                        log.debug("emacs variables error: line '%s' "
 616                                  "does not use proper suffix '%s'"
 617                                  % (line, suffix))
 618                        return {}
 619
 620                # Parse out one emacs var per line.
 621                continued_for = None
 622                for line in lines[:-1]:  # no var on the last line ("PREFIX End:")
 623                    if prefix: line = line[len(prefix):]  # strip prefix
 624                    if suffix: line = line[:-len(suffix)]  # strip suffix
 625                    line = line.strip()
 626                    if continued_for:
 627                        variable = continued_for
 628                        if line.endswith('\\'):
 629                            line = line[:-1].rstrip()
 630                        else:
 631                            continued_for = None
 632                        emacs_vars[variable] += ' ' + line
 633                    else:
 634                        try:
 635                            variable, value = line.split(':', 1)
 636                        except ValueError:
 637                            log.debug("local variables error: missing colon "
 638                                      "in local variables entry: '%s'" % line)
 639                            continue
 640                        # Do NOT lowercase the variable name, because Emacs only
 641                        # allows "mode" (and not "Mode", "MoDe", etc.) in this block.
 642                        value = value.strip()
 643                        if value.endswith('\\'):
 644                            value = value[:-1].rstrip()
 645                            continued_for = variable
 646                        else:
 647                            continued_for = None
 648                        emacs_vars[variable] = value
 649
 650        # Unquote values.
 651        for var, val in list(emacs_vars.items()):
 652            if len(val) > 1 and (val.startswith('"') and val.endswith('"')
 653               or val.startswith('"') and val.endswith('"')):
 654                emacs_vars[var] = val[1:-1]
 655
 656        return emacs_vars
 657
 658    def _detab_line(self, line):
 659        r"""Recusively convert tabs to spaces in a single line.
 660
 661        Called from _detab()."""
 662        if '\t' not in line:
 663            return line
 664        chunk1, chunk2 = line.split('\t', 1)
 665        chunk1 += (' ' * (self.tab_width - len(chunk1) % self.tab_width))
 666        output = chunk1 + chunk2
 667        return self._detab_line(output)
 668
 669    def _detab(self, text):
 670        r"""Iterate text line by line and convert tabs to spaces.
 671
 672            >>> m = Markdown()
 673            >>> m._detab("\tfoo")
 674            '    foo'
 675            >>> m._detab("  \tfoo")
 676            '    foo'
 677            >>> m._detab("\t  foo")
 678            '      foo'
 679            >>> m._detab("  foo")
 680            '  foo'
 681            >>> m._detab("  foo\n\tbar\tblam")
 682            '  foo\n    bar blam'
 683        """
 684        if '\t' not in text:
 685            return text
 686        output = []
 687        for line in text.splitlines():
 688            output.append(self._detab_line(line))
 689        return '\n'.join(output)
 690
 691    # I broke out the html5 tags here and add them to _block_tags_a and
 692    # _block_tags_b.  This way html5 tags are easy to keep track of.
 693    _html5tags = '|article|aside|header|hgroup|footer|nav|section|figure|figcaption'
 694
 695    _block_tags_a = 'p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math|ins|del'
 696    _block_tags_a += _html5tags
 697
 698    _strict_tag_block_re = re.compile(r"""
 699        (                       # save in \1
 700            ^                   # start of line  (with re.M)
 701            <(%s)               # start tag = \2
 702            \b                  # word break
 703            (.*\n)*?            # any number of lines, minimally matching
 704            </\2>               # the matching end tag
 705            [ \t]*              # trailing spaces/tabs
 706            (?=\n+|\Z)          # followed by a newline or end of document
 707        )
 708        """ % _block_tags_a,
 709        re.X | re.M)
 710
 711    _block_tags_b = 'p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math'
 712    _block_tags_b += _html5tags
 713
 714    _liberal_tag_block_re = re.compile(r"""
 715        (                       # save in \1
 716            ^                   # start of line  (with re.M)
 717            <(%s)               # start tag = \2
 718            \b                  # word break
 719            (.*\n)*?            # any number of lines, minimally matching
 720            .*</\2>             # the matching end tag
 721            [ \t]*              # trailing spaces/tabs
 722            (?=\n+|\Z)          # followed by a newline or end of document
 723        )
 724        """ % _block_tags_b,
 725        re.X | re.M)
 726
 727    _html_markdown_attr_re = re.compile(
 728        r'''\s+markdown=("1"|'1')''')
 729    def _hash_html_block_sub(self, match, raw=False):
 730        html = match.group(1)
 731        if raw and self.safe_mode:
 732            html = self._sanitize_html(html)
 733        elif 'markdown-in-html' in self.extras and 'markdown=' in html:
 734            first_line = html.split('\n', 1)[0]
 735            m = self._html_markdown_attr_re.search(first_line)
 736            if m:
 737                lines = html.split('\n')
 738                middle = '\n'.join(lines[1:-1])
 739                last_line = lines[-1]
 740                first_line = first_line[:m.start()] + first_line[m.end():]
 741                f_key = _hash_text(first_line)
 742                self.html_blocks[f_key] = first_line
 743                l_key = _hash_text(last_line)
 744                self.html_blocks[l_key] = last_line
 745                return ''.join(["\n\n", f_key,
 746                    "\n\n", middle, "\n\n",
 747                    l_key, "\n\n"])
 748        key = _hash_text(html)
 749        self.html_blocks[key] = html
 750        return "\n\n" + key + "\n\n"
 751
 752    def _hash_html_blocks(self, text, raw=False):
 753        """Hashify HTML blocks
 754
 755        We only want to do this for block-level HTML tags, such as headers,
 756        lists, and tables. That's because we still want to wrap <p>s around
 757        "paragraphs" that are wrapped in non-block-level tags, such as anchors,
 758        phrase emphasis, and spans. The list of tags we're looking for is
 759        hard-coded.
 760
 761        @param raw {boolean} indicates if these are raw HTML blocks in
 762            the original source. It makes a difference in "safe" mode.
 763        """
 764        if '<' not in text:
 765            return text
 766
 767        # Pass `raw` value into our calls to self._hash_html_block_sub.
 768        hash_html_block_sub = _curry(self._hash_html_block_sub, raw=raw)
 769
 770        # First, look for nested blocks, e.g.:
 771        #   <div>
 772        #       <div>
 773        #       tags for inner block must be indented.
 774        #       </div>
 775        #   </div>
 776        #
 777        # The outermost tags must start at the left margin for this to match, and
 778        # the inner nested divs must be indented.
 779        # We need to do this before the next, more liberal match, because the next
 780        # match will start at the first `<div>` and stop at the first `</div>`.
 781        text = self._strict_tag_block_re.sub(hash_html_block_sub, text)
 782
 783        # Now match more liberally, simply from `\n<tag>` to `</tag>\n`
 784        text = self._liberal_tag_block_re.sub(hash_html_block_sub, text)
 785
 786        # Special case just for <hr />. It was easier to make a special
 787        # case than to make the other regex more complicated.
 788        if "<hr" in text:
 789            _hr_tag_re = _hr_tag_re_from_tab_width(self.tab_width)
 790            text = _hr_tag_re.sub(hash_html_block_sub, text)
 791
 792        # Special case for standalone HTML comments:
 793        if "<!--" in text:
 794            start = 0
 795            while True:
 796                # Delimiters for next comment block.
 797                try:
 798                    start_idx = text.index("<!--", start)
 799                except ValueError:
 800                    break
 801                try:
 802                    end_idx = text.index("-->", start_idx) + 3
 803                except ValueError:
 804                    break
 805
 806                # Start position for next comment block search.
 807                start = end_idx
 808
 809                # Validate whitespace before comment.
 810                if start_idx:
 811                    # - Up to `tab_width - 1` spaces before start_idx.
 812                    for i in range(self.tab_width - 1):
 813                        if text[start_idx - 1] != ' ':
 814                            break
 815                        start_idx -= 1
 816                        if start_idx == 0:
 817                            break
 818                    # - Must be preceded by 2 newlines or hit the start of
 819                    #   the document.
 820                    if start_idx == 0:
 821                        pass
 822                    elif start_idx == 1 and text[0] == '\n':
 823                        start_idx = 0  # to match minute detail of Markdown.pl regex
 824                    elif text[start_idx-2:start_idx] == '\n\n':
 825                        pass
 826                    else:
 827                        break
 828
 829                # Validate whitespace after comment.
 830                # - Any number of spaces and tabs.
 831                while end_idx < len(text):
 832                    if text[end_idx] not in ' \t':
 833                        break
 834                    end_idx += 1
 835                # - Must be following by 2 newlines or hit end of text.
 836                if text[end_idx:end_idx+2] not in ('', '\n', '\n\n'):
 837                    continue
 838
 839                # Escape and hash (must match `_hash_html_block_sub`).
 840                html = text[start_idx:end_idx]
 841                if raw and self.safe_mode:
 842                    html = self._sanitize_html(html)
 843                key = _hash_text(html)
 844                self.html_blocks[key] = html
 845                text = text[:start_idx] + "\n\n" + key + "\n\n" + text[end_idx:]
 846
 847        if "xml" in self.extras:
 848            # Treat XML processing instructions and namespaced one-liner
 849            # tags as if they were block HTML tags. E.g., if standalone
 850            # (i.e. are their own paragraph), the following do not get
 851            # wrapped in a <p> tag:
 852            #    <?foo bar?>
 853            #
 854            #    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="chapter_1.md"/>
 855            _xml_oneliner_re = _xml_oneliner_re_from_tab_width(self.tab_width)
 856            text = _xml_oneliner_re.sub(hash_html_block_sub, text)
 857
 858        return text
 859
 860    def _strip_link_definitions(self, text):
 861        # Strips link definitions from text, stores the URLs and titles in
 862        # hash references.
 863        less_than_tab = self.tab_width - 1
 864
 865        # Link defs are in the form:
 866        #   [id]: url "optional title"
 867        _link_def_re = re.compile(r"""
 868            ^[ ]{0,%d}\[(.+)\]: # id = \1
 869              [ \t]*
 870              \n?               # maybe *one* newline
 871              [ \t]*
 872            <?(.+?)>?           # url = \2
 873              [ \t]*
 874            (?:
 875                \n?             # maybe one newline
 876                [ \t]*
 877                (?<=\s)         # lookbehind for whitespace
 878                ['"(]
 879                ([^\n]*)        # title = \3
 880                ['")]
 881                [ \t]*
 882            )?  # title is optional
 883            (?:\n+|\Z)
 884            """ % less_than_tab, re.X | re.M | re.U)
 885        return _link_def_re.sub(self._extract_link_def_sub, text)
 886
 887    def _extract_link_def_sub(self, match):
 888        id, url, title = match.groups()
 889        key = id.lower()    # Link IDs are case-insensitive
 890        self.urls[key] = self._encode_amps_and_angles(url)
 891        if title:
 892            self.titles[key] = title
 893        return ""
 894
 895    def _do_numbering(self, text):
 896        ''' We handle the special extension for generic numbering for
 897            tables, figures etc.
 898        '''
 899        # First pass to define all the references
 900        self.regex_defns = re.compile(r'''
 901            \[\#(\w+) # the counter.  Open square plus hash plus a word \1
 902            ([^@]*)   # Some optional characters, that aren't an @. \2
 903            @(\w+)       # the id.  Should this be normed? \3
 904            ([^\]]*)\]   # The rest of the text up to the terminating ] \4
 905            ''', re.VERBOSE)
 906        self.regex_subs = re.compile(r"\[@(\w+)\s*\]")  # [@ref_id]
 907        counters = {}
 908        references = {}
 909        replacements = []
 910        definition_html = '<figcaption class="{}" id="counter-ref-{}">{}{}{}</figcaption>'
 911        reference_html = '<a class="{}" href="#counter-ref-{}">{}</a>'
 912        for match in self.regex_defns.finditer(text):
 913            # We must have four match groups otherwise this isn't a numbering reference
 914            if len(match.groups()) != 4:
 915                continue
 916            counter = match.group(1)
 917            text_before = match.group(2).strip()
 918            ref_id = match.group(3)
 919            text_after = match.group(4)
 920            number = counters.get(counter, 1)
 921            references[ref_id] = (number, counter)
 922            replacements.append((match.start(0),
 923                                 definition_html.format(counter,
 924                                                        ref_id,
 925                                                        text_before,
 926                                                        number,
 927                                                        text_after),
 928                                 match.end(0)))
 929            counters[counter] = number + 1
 930        for repl in reversed(replacements):
 931            text = text[:repl[0]] + repl[1] + text[repl[2]:]
 932
 933        # Second pass to replace the references with the right
 934        # value of the counter
 935        # Fwiw, it's vaguely annoying to have to turn the iterator into
 936        # a list and then reverse it but I can't think of a better thing to do.
 937        for match in reversed(list(self.regex_subs.finditer(text))):
 938            number, counter = references.get(match.group(1), (None, None))
 939            if number is not None:
 940                repl = reference_html.format(counter,
 941                                             match.group(1),
 942                                             number)
 943            else:
 944                repl = reference_html.format(match.group(1),
 945                                             'countererror',
 946                                             '?' + match.group(1) + '?')
 947            if "smarty-pants" in self.extras:
 948                repl = repl.replace('"', self._escape_table['"'])
 949
 950            text = text[:match.start()] + repl + text[match.end():]
 951        return text
 952
 953    def _extract_footnote_def_sub(self, match):
 954        id, text = match.groups()
 955        text = _dedent(text, skip_first_line=not text.startswith('\n')).strip()
 956        normed_id = re.sub(r'\W', '-', id)
 957        # Ensure footnote text ends with a couple newlines (for some
 958        # block gamut matches).
 959        self.footnotes[normed_id] = text + "\n\n"
 960        return ""
 961
 962    def _strip_footnote_definitions(self, text):
 963        """A footnote definition looks like this:
 964
 965            [^note-id]: Text of the note.
 966
 967                May include one or more indented paragraphs.
 968
 969        Where,
 970        - The 'note-id' can be pretty much anything, though typically it
 971          is the number of the footnote.
 972        - The first paragraph may start on the next line, like so:
 973
 974            [^note-id]:
 975                Text of the note.
 976        """
 977        less_than_tab = self.tab_width - 1
 978        footnote_def_re = re.compile(r'''
 979            ^[ ]{0,%d}\[\^(.+)\]:   # id = \1
 980            [ \t]*
 981            (                       # footnote text = \2
 982              # First line need not start with the spaces.
 983              (?:\s*.*\n+)
 984              (?:
 985                (?:[ ]{%d} | \t)  # Subsequent lines must be indented.
 986                .*\n+
 987              )*
 988            )
 989            # Lookahead for non-space at line-start, or end of doc.
 990            (?:(?=^[ ]{0,%d}\S)|\Z)
 991            ''' % (less_than_tab, self.tab_width, self.tab_width),
 992            re.X | re.M)
 993        return footnote_def_re.sub(self._extract_footnote_def_sub, text)
 994
 995    _hr_re = re.compile(r'^[ ]{0,3}([-_*])[ ]{0,2}(\1[ ]{0,2}){2,}$', re.M)
 996
 997    def _run_block_gamut(self, text):
 998        # These are all the transformations that form block-level
 999        # tags like paragraphs, headers, and list items.
1000
1001        if "fenced-code-blocks" in self.extras:
1002            text = self._do_fenced_code_blocks(text)
1003
1004        text = self._do_headers(text)
1005
1006        # Do Horizontal Rules:
1007        # On the number of spaces in horizontal rules: The spec is fuzzy: "If
1008        # you wish, you may use spaces between the hyphens or asterisks."
1009        # Markdown.pl 1.0.1's hr regexes limit the number of spaces between the
1010        # hr chars to one or two. We'll reproduce that limit here.
1011        hr = "\n<hr"+self.empty_element_suffix+"\n"
1012        text = re.sub(self._hr_re, hr, text)
1013
1014        text = self._do_lists(text)
1015
1016        if "pyshell" in self.extras:
1017            text = self._prepare_pyshell_blocks(text)
1018        if "wiki-tables" in self.extras:
1019            text = self._do_wiki_tables(text)
1020        if "tables" in self.extras:
1021            text = self._do_tables(text)
1022
1023        text = self._do_code_blocks(text)
1024
1025        text = self._do_block_quotes(text)
1026
1027        # We already ran _HashHTMLBlocks() before, in Markdown(), but that
1028        # was to escape raw HTML in the original Markdown source. This time,
1029        # we're escaping the markup we've just created, so that we don't wrap
1030        # <p> tags around block-level tags.
1031        text = self._hash_html_blocks(text)
1032
1033        text = self._form_paragraphs(text)
1034
1035        return text
1036
1037    def _pyshell_block_sub(self, match):
1038        if "fenced-code-blocks" in self.extras:
1039            dedented = _dedent(match.group(0))
1040            return self._do_fenced_code_blocks("```pycon\n" + dedented + "```\n")
1041        lines = match.group(0).splitlines(0)
1042        _dedentlines(lines)
1043        indent = ' ' * self.tab_width
1044        s = ('\n'  # separate from possible cuddled paragraph
1045             + indent + ('\n'+indent).join(lines)
1046             + '\n\n')
1047        return s
1048
1049    def _prepare_pyshell_blocks(self, text):
1050        """Ensure that Python interactive shell sessions are put in
1051        code blocks -- even if not properly indented.
1052        """
1053        if ">>>" not in text:
1054            return text
1055
1056        less_than_tab = self.tab_width - 1
1057        _pyshell_block_re = re.compile(r"""
1058            ^([ ]{0,%d})>>>[ ].*\n  # first line
1059            ^(\1[^\S\n]*\S.*\n)*    # any number of subsequent lines with at least one character
1060            ^\n                     # ends with a blank line
1061            """ % less_than_tab, re.M | re.X)
1062
1063        return _pyshell_block_re.sub(self._pyshell_block_sub, text)
1064
1065    def _table_sub(self, match):
1066        trim_space_re = '^[ \t\n]+|[ \t\n]+$'
1067        trim_bar_re = r'^\||\|$'
1068        split_bar_re = r'^\||(?<![\`\\])\|'
1069        escape_bar_re = r'\\\|'
1070
1071        head, underline, body = match.groups()
1072
1073        # Determine aligns for columns.
1074        cols = [re.sub(escape_bar_re, '|', cell.strip()) for cell in re.split(split_bar_re, re.sub(trim_bar_re, "", re.sub(trim_space_re, "", underline)))]
1075        align_from_col_idx = {}
1076        for col_idx, col in enumerate(cols):
1077            if col[0] == ':' and col[-1] == ':':
1078                align_from_col_idx[col_idx] = ' style="text-align:center;"'
1079            elif col[0] == ':':
1080                align_from_col_idx[col_idx] = ' style="text-align:left;"'
1081            elif col[-1] == ':':
1082                align_from_col_idx[col_idx] = ' style="text-align:right;"'
1083
1084        # thead
1085        hlines = ['<table%s>' % self._html_class_str_from_tag('table'), '<thead>', '<tr>']
1086        cols = [re.sub(escape_bar_re, '|', cell.strip()) for cell in re.split(split_bar_re, re.sub(trim_bar_re, "", re.sub(trim_space_re, "", head)))]
1087        for col_idx, col in enumerate(cols):
1088            hlines.append('  <th%s>%s</th>' % (
1089                align_from_col_idx.get(col_idx, ''),
1090                self._run_span_gamut(col)
1091            ))
1092        hlines.append('</tr>')
1093        hlines.append('</thead>')
1094
1095        # tbody
1096        hlines.append('<tbody>')
1097        for line in body.strip('\n').split('\n'):
1098            hlines.append('<tr>')
1099            cols = [re.sub(escape_bar_re, '|', cell.strip()) for cell in re.split(split_bar_re, re.sub(trim_bar_re, "", re.sub(trim_space_re, "", line)))]
1100            for col_idx, col in enumerate(cols):
1101                hlines.append('  <td%s>%s</td>' % (
1102                    align_from_col_idx.get(col_idx, ''),
1103                    self._run_span_gamut(col)
1104                ))
1105            hlines.append('</tr>')
1106        hlines.append('</tbody>')
1107        hlines.append('</table>')
1108
1109        return '\n'.join(hlines) + '\n'
1110
1111    def _do_tables(self, text):
1112        """Copying PHP-Markdown and GFM table syntax. Some regex borrowed from
1113        https://github.com/michelf/php-markdown/blob/lib/Michelf/Markdown.php#L2538
1114        """
1115        less_than_tab = self.tab_width - 1
1116        table_re = re.compile(r'''
1117                (?:(?<=\n\n)|\A\n?)             # leading blank line
1118
1119                ^[ ]{0,%d}                      # allowed whitespace
1120                (.*[|].*)  \n                   # $1: header row (at least one pipe)
1121
1122                ^[ ]{0,%d}                      # allowed whitespace
1123                (                               # $2: underline row
1124                    # underline row with leading bar
1125                    (?:  \|\ *:?-+:?\ *  )+  \|? \s? \n
1126                    |
1127                    # or, underline row without leading bar
1128                    (?:  \ *:?-+:?\ *\|  )+  (?:  \ *:?-+:?\ *  )? \s? \n
1129                )
1130
1131                (                               # $3: data rows
1132                    (?:
1133                        ^[ ]{0,%d}(?!\ )         # ensure line begins with 0 to less_than_tab spaces
1134                        .*\|.*  \n
1135                    )+
1136                )
1137            ''' % (less_than_tab, less_than_tab, less_than_tab), re.M | re.X)
1138        return table_re.sub(self._table_sub, text)
1139
1140    def _wiki_table_sub(self, match):
1141        ttext = match.group(0).strip()
1142        # print('wiki table: %r' % match.group(0))
1143        rows = []
1144        for line in ttext.splitlines(0):
1145            line = line.strip()[2:-2].strip()
1146            row = [c.strip() for c in re.split(r'(?<!\\)\|\|', line)]
1147            rows.append(row)
1148        # from pprint import pprint
1149        # pprint(rows)
1150        hlines = []
1151
1152        def add_hline(line, indents=0):
1153            hlines.append((self.tab * indents) + line)
1154
1155        def format_cell(text):
1156            return self._run_span_gamut(re.sub(r"^\s*~", "", cell).strip(" "))
1157
1158        add_hline('<table%s>' % self._html_class_str_from_tag('table'))
1159        # Check if first cell of first row is a header cell. If so, assume the whole row is a header row.
1160        if rows and rows[0] and re.match(r"^\s*~", rows[0][0]):
1161            add_hline('<thead>', 1)
1162            add_hline('<tr>', 2)
1163            for cell in rows[0]:
1164                add_hline("<th>{}</th>".format(format_cell(cell)), 3)
1165            add_hline('</tr>', 2)
1166            add_hline('</thead>', 1)
1167            # Only one header row allowed.
1168            rows = rows[1:]
1169        # If no more rows, don't create a tbody.
1170        if rows:
1171            add_hline('<tbody>', 1)
1172            for row in rows:
1173                add_hline('<tr>', 2)
1174                for cell in row:
1175                    add_hline('<td>{}</td>'.format(format_cell(cell)), 3)
1176                add_hline('</tr>', 2)
1177            add_hline('</tbody>', 1)
1178        add_hline('</table>')
1179        return '\n'.join(hlines) + '\n'
1180
1181    def _do_wiki_tables(self, text):
1182        # Optimization.
1183        if "||" not in text:
1184            return text
1185
1186        less_than_tab = self.tab_width - 1
1187        wiki_table_re = re.compile(r'''
1188            (?:(?<=\n\n)|\A\n?)            # leading blank line
1189            ^([ ]{0,%d})\|\|.+?\|\|[ ]*\n  # first line
1190            (^\1\|\|.+?\|\|\n)*        # any number of subsequent lines
1191            ''' % less_than_tab, re.M | re.X)
1192        return wiki_table_re.sub(self._wiki_table_sub, text)
1193
1194    def _run_span_gamut(self, text):
1195        # These are all the transformations that occur *within* block-level
1196        # tags like paragraphs, headers, and list items.
1197
1198        text = self._do_code_spans(text)
1199
1200        text = self._escape_special_chars(text)
1201
1202        # Process anchor and image tags.
1203        if "link-patterns" in self.extras:
1204            text = self._do_link_patterns(text)
1205
1206        text = self._do_links(text)
1207
1208        # Make links out of things like `<http://example.com/>`
1209        # Must come after _do_links(), because you can use < and >
1210        # delimiters in inline links like [this](<url>).
1211        text = self._do_auto_links(text)
1212
1213        text = self._encode_amps_and_angles(text)
1214
1215        if "strike" in self.extras:
1216            text = self._do_strike(text)
1217
1218        if "underline" in self.extras:
1219            text = self._do_underline(text)
1220
1221        text = self._do_italics_and_bold(text)
1222
1223        if "smarty-pants" in self.extras:
1224            text = self._do_smart_punctuation(text)
1225
1226        # Do hard breaks:
1227        if "break-on-newline" in self.extras:
1228            text = re.sub(r" *\n", "<br%s\n" % self.empty_element_suffix, text)
1229        else:
1230            text = re.sub(r" {2,}\n", " <br%s\n" % self.empty_element_suffix, text)
1231
1232        return text
1233
1234    # "Sorta" because auto-links are identified as "tag" tokens.
1235    _sorta_html_tokenize_re = re.compile(r"""
1236        (
1237            # tag
1238            </?
1239            (?:\w+)                                     # tag name
1240            (?:\s+(?:[\w-]+:)?[\w-]+=(?:".*?"|'.*?'))*  # attributes
1241            \s*/?>
1242            |
1243            # auto-link (e.g., <http://www.activestate.com/>)
1244            <[\w~:/?#\[\]@!$&'\(\)*+,;%=\.\\-]+>
1245            |
1246            <!--.*?-->      # comment
1247            |
1248            <\?.*?\?>       # processing instruction
1249        )
1250        """, re.X)
1251
1252    def _escape_special_chars(self, text):
1253        # Python markdown note: the HTML tokenization here differs from
1254        # that in Markdown.pl, hence the behaviour for subtle cases can
1255        # differ (I believe the tokenizer here does a better job because
1256        # it isn't susceptible to unmatched '<' and '>' in HTML tags).
1257        # Note, however, that '>' is not allowed in an auto-link URL
1258        # here.
1259        escaped = []
1260        is_html_markup = False
1261        for token in self._sorta_html_tokenize_re.split(text):
1262            if is_html_markup:
1263                # Within tags/HTML-comments/auto-links, encode * and _
1264                # so they don't conflict with their use in Markdown for
1265                # italics and strong.  We're replacing each such
1266                # character with its corresponding MD5 checksum value;
1267                # this is likely overkill, but it should prevent us from
1268                # colliding with the escape values by accident.
1269                escaped.append(token.replace('*', self._escape_table['*'])
1270                                    .replace('_', self._escape_table['_']))
1271            else:
1272                escaped.append(self._encode_backslash_escapes(token))
1273            is_html_markup = not is_html_markup
1274        return ''.join(escaped)
1275
1276    def _hash_html_spans(self, text):
1277        # Used for safe_mode.
1278
1279        def _is_auto_link(s):
1280            if ':' in s and self._auto_link_re.match(s):
1281                return True
1282            elif '@' in s and self._auto_email_link_re.match(s):
1283                return True
1284            return False
1285
1286        tokens = []
1287        is_html_markup = False
1288        for token in self._sorta_html_tokenize_re.split(text):
1289            if is_html_markup and not _is_auto_link(token):
1290                sanitized = self._sanitize_html(token)
1291                key = _hash_text(sanitized)
1292                self.html_spans[key] = sanitized
1293                tokens.append(key)
1294            else:
1295                tokens.append(self._encode_incomplete_tags(token))
1296            is_html_markup = not is_html_markup
1297        return ''.join(tokens)
1298
1299    def _unhash_html_spans(self, text):
1300        for key, sanitized in list(self.html_spans.items()):
1301            text = text.replace(key, sanitized)
1302        return text
1303
1304    def _sanitize_html(self, s):
1305        if self.safe_mode == "replace":
1306            return self.html_removed_text
1307        elif self.safe_mode == "escape":
1308            replacements = [
1309                ('&', '&amp;'),
1310                ('<', '&lt;'),
1311                ('>', '&gt;'),
1312            ]
1313            for before, after in replacements:
1314                s = s.replace(before, after)
1315            return s
1316        else:
1317            raise MarkdownError("invalid value for 'safe_mode': %r (must be "
1318                                "'escape' or 'replace')" % self.safe_mode)
1319
1320    _inline_link_title = re.compile(r'''
1321            (                   # \1
1322              [ \t]+
1323              (['"])            # quote char = \2
1324              (?P<title>.*?)
1325              \2
1326            )?                  # title is optional
1327          \)$
1328        ''', re.X | re.S)
1329    _tail_of_reference_link_re = re.compile(r'''
1330          # Match tail of: [text][id]
1331          [ ]?          # one optional space
1332          (?:\n[ ]*)?   # one optional newline followed by spaces
1333          \[
1334            (?P<id>.*?)
1335          \]
1336        ''', re.X | re.S)
1337
1338    _whitespace = re.compile(r'\s*')
1339
1340    _strip_anglebrackets = re.compile(r'<(.*)>.*')
1341
1342    def _find_non_whitespace(self, text, start):
1343        """Returns the index of the first non-whitespace character in text
1344        after (and including) start
1345        """
1346        match = self._whitespace.match(text, start)
1347        return match.end()
1348
1349    def _find_balanced(self, text, start, open_c, close_c):
1350        """Returns the index where the open_c and close_c characters balance
1351        out - the same number of open_c and close_c are encountered - or the
1352        end of string if it's reached before the balance point is found.
1353        """
1354        i = start
1355        l = len(text)
1356        count = 1
1357        while count > 0 and i < l:
1358            if text[i] == open_c:
1359                count += 1
1360            elif text[i] == close_c:
1361                count -= 1
1362            i += 1
1363        return i
1364
1365    def _extract_url_and_title(self, text, start):
1366        """Extracts the url and (optional) title from the tail of a link"""
1367        # text[start] equals the opening parenthesis
1368        idx = self._find_non_whitespace(text, start+1)
1369        if idx == len(text):
1370            return None, None, None
1371        end_idx = idx
1372        has_anglebrackets = text[idx] == "<"
1373        if has_anglebrackets:
1374            end_idx = self._find_balanced(text, end_idx+1, "<", ">")
1375        end_idx = self._find_balanced(text, end_idx, "(", ")")
1376        match = self._inline_link_title.search(text, idx, end_idx)
1377        if not match:
1378            return None, None, None
1379        url, title = text[idx:match.start()], match.group("title")
1380        if has_anglebrackets:
1381            url = self._strip_anglebrackets.sub(r'\1', url)
1382        return url, title, end_idx
1383
1384    _safe_protocols = re.compile(r'(https?|ftp):', re.I)
1385    def _do_links(self, text):
1386        """Turn Markdown link shortcuts into XHTML <a> and <img> tags.
1387
1388        This is a combination of Markdown.pl's _DoAnchors() and
1389        _DoImages(). They are done together because that simplified the
1390        approach. It was necessary to use a different approach than
1391        Markdown.pl because of the lack of atomic matching support in
1392        Python's regex engine used in $g_nested_brackets.
1393        """
1394        MAX_LINK_TEXT_SENTINEL = 3000  # markdown2 issue 24
1395
1396        # `anchor_allowed_pos` is used to support img links inside
1397        # anchors, but not anchors inside anchors. An anchor's start
1398        # pos must be `>= anchor_allowed_pos`.
1399        anchor_allowed_pos = 0
1400
1401        curr_pos = 0
1402        while True:  # Handle the next link.
1403            # The next '[' is the start of:
1404            # - an inline anchor:   [text](url "title")
1405            # - a reference anchor: [text][id]
1406            # - an inline img:      ![text](url "title")
1407            # - a reference img:    ![text][id]
1408            # - a footnote ref:     [^id]
1409            #   (Only if 'footnotes' extra enabled)
1410            # - a footnote defn:    [^id]: ...
1411            #   (Only if 'footnotes' extra enabled) These have already
1412            #   been stripped in _strip_footnote_definitions() so no
1413            #   need to watch for them.
1414            # - a link definition:  [id]: url "title"
1415            #   These have already been stripped in
1416            #   _strip_link_definitions() so no need to watch for them.
1417            # - not markup:         [...anything else...
1418            try:
1419                start_idx = text.index('[', curr_pos)
1420            except ValueError:
1421                break
1422            text_length = len(text)
1423
1424            # Find the matching closing ']'.
1425            # Markdown.pl allows *matching* brackets in link text so we
1426            # will here too. Markdown.pl *doesn't* currently allow
1427            # matching brackets in img alt text -- we'll differ in that
1428            # regard.
1429            bracket_depth = 0
1430            for p in range(start_idx+1, min(start_idx+MAX_LINK_TEXT_SENTINEL,
1431                                            text_length)):
1432                ch = text[p]
1433                if ch == ']':
1434                    bracket_depth -= 1
1435                    if bracket_depth < 0:
1436                        break
1437                elif ch == '[':
1438                    bracket_depth += 1
1439            else:
1440                # Closing bracket not found within sentinel length.
1441                # This isn't markup.
1442                curr_pos = start_idx + 1
1443                continue
1444            link_text = text[start_idx+1:p]
1445
1446            # Fix for issue 341 - Injecting XSS into link text
1447            if self.safe_mode:
1448                link_text = self._hash_html_spans(link_text)
1449                link_text = self._unhash_html_spans(link_text)
1450
1451            # Possibly a footnote ref?
1452            if "footnotes" in self.extras and link_text.startswith("^"):
1453                normed_id = re.sub(r'\W', '-', link_text[1:])
1454                if normed_id in self.footnotes:
1455                    self.footnote_ids.append(normed_id)
1456                    result = '<sup class="footnote-ref" id="fnref-%s">' \
1457                             '<a href="#fn-%s">%s</a></sup>' \
1458                             % (normed_id, normed_id, len(self.footnote_ids))
1459                    text = text[:start_idx] + result + text[p+1:]
1460                else:
1461                    # This id isn't defined, leave the markup alone.
1462                    curr_pos = p+1
1463                continue
1464
1465            # Now determine what this is by the remainder.
1466            p += 1
1467            if p == text_length:
1468                return text
1469
1470            # Inline anchor or img?
1471            if text[p] == '(':  # attempt at perf improvement
1472                url, title, url_end_idx = self._extract_url_and_title(text, p)
1473                if url is not None:
1474                    # Handle an inline anchor or img.
1475                    is_img = start_idx > 0 and text[start_idx-1] == "!"
1476                    if is_img:
1477                        start_idx -= 1
1478
1479                    # We've got to encode these to avoid conflicting
1480                    # with italics/bold.
1481                    url = url.replace('*', self._escape_table['*']) \
1482                             .replace('_', self._escape_table['_'])
1483                    if title:
1484                        title_str = ' title="%s"' % (
1485                            _xml_escape_attr(title)
1486                                .replace('*', self._escape_table['*'])
1487                                .replace('_', self._escape_table['_']))
1488                    else:
1489                        title_str = ''
1490                    if is_img:
1491                        img_class_str = self._html_class_str_from_tag("img")
1492                        result = '<img src="%s" alt="%s"%s%s%s' \
1493                            % (_html_escape_url(url, safe_mode=self.safe_mode),
1494                               _xml_escape_attr(link_text),
1495                               title_str,
1496                               img_class_str,
1497                               self.empty_element_suffix)
1498                        if "smarty-pants" in self.extras:
1499                            result = result.replace('"', self._escape_table['"'])
1500                        curr_pos = start_idx + len(result)
1501                        text = text[:start_idx] + result + text[url_end_idx:]
1502                    elif start_idx >= anchor_allowed_pos:
1503                        safe_link = self._safe_protocols.match(url) or url.startswith('#')
1504                        if self.safe_mode and not safe_link:
1505                            result_head = '<a href="#"%s>' % (title_str)
1506                        else:
1507                            result_head = '<a href="%s"%s>' % (_html_escape_url(url, safe_mode=self.safe_mode), title_str)
1508                        result = '%s%s</a>' % (result_head, link_text)
1509                        if "smarty-pants" in self.extras:
1510                            result = result.replace('"', self._escape_table['"'])
1511                        # <img> allowed from curr_pos on, <a> from
1512                        # anchor_allowed_pos on.
1513                        curr_pos = start_idx + len(result_head)
1514                        anchor_allowed_pos = start_idx + len(result)
1515                        text = text[:start_idx] + result + text[url_end_idx:]
1516                    else:
1517                        # Anchor not allowed here.
1518                        curr_pos = start_idx + 1
1519                    continue
1520
1521            # Reference anchor or img?
1522            else:
1523                match = self._tail_of_reference_link_re.match(text, p)
1524                if match:
1525                    # Handle a reference-style anchor or img.
1526                    is_img = start_idx > 0 and text[start_idx-1] == "!"
1527                    if is_img:
1528                        start_idx -= 1
1529                    link_id = match.group("id").lower()
1530                    if not link_id:
1531                        link_id = link_text.lower()  # for links like [this][]
1532                    if link_id in self.urls:
1533                        url = self.urls[link_id]
1534                        # We've got to encode these to avoid conflicting
1535                        # with italics/bold.
1536                        url = url.replace('*', self._escape_table['*']) \
1537                                 .replace('_', self._escape_table['_'])
1538                        title = self.titles.get(link_id)
1539                        if title:
1540                            title = _xml_escape_attr(title) \
1541                                .replace('*', self._escape_table['*']) \
1542                                .replace('_', self._escape_table['_'])
1543                            title_str = ' title="%s"' % title
1544                        else:
1545                            title_str = ''
1546                        if is_img:
1547                            img_class_str = self._html_class_str_from_tag("img")
1548                            result = '<img src="%s" alt="%s"%s%s%s' \
1549                                % (_html_escape_url(url, safe_mode=self.safe_mode),
1550                                   _xml_escape_attr(link_text),
1551                                   title_str,
1552                                   img_class_str,
1553                                   self.empty_element_suffix)
1554                            if "smarty-pants" in self.extras:
1555                                result = result.replace('"', self._escape_table['"'])
1556                            curr_pos = start_idx + len(result)
1557                            text = text[:start_idx] + result + text[match.end():]
1558                        elif start_idx >= anchor_allowed_pos:
1559                            if self.safe_mode and not self._safe_protocols.match(url):
1560                                result_head = '<a href="#"%s>' % (title_str)
1561                            else:
1562                                result_head = '<a href="%s"%s>' % (_html_escape_url(url, safe_mode=self.safe_mode), title_str)
1563                            result = '%s%s</a>' % (result_head, link_text)
1564                            if "smarty-pants" in self.extras:
1565                                result = result.replace('"', self._escape_table['"'])
1566                            # <img> allowed from curr_pos on, <a> from
1567                            # anchor_allowed_pos on.
1568                            curr_pos = start_idx + len(result_head)
1569                            anchor_allowed_pos = start_idx + len(result)
1570                            text = text[:start_idx] + result + text[match.end():]
1571                        else:
1572                            # Anchor not allowed here.
1573                            curr_pos = start_idx + 1
1574                    else:
1575                        # This id isn't defined, leave the markup alone.
1576                        curr_pos = match.end()
1577                    continue
1578
1579            # Otherwise, it isn't markup.
1580            curr_pos = start_idx + 1
1581
1582        return text
1583
1584    def header_id_from_text(self, text, prefix, n):
1585        """Generate a header id attribute value from the given header
1586        HTML content.
1587
1588        This is only called if the "header-ids" extra is enabled.
1589        Subclasses may override this for different header ids.
1590
1591        @param text {str} The text of the header tag
1592        @param prefix {str} The requested prefix for header ids. This is the
1593            value of the "header-ids" extra key, if any. Otherwise, None.
1594        @param n {int} The <hN> tag number, i.e. `1` for an <h1> tag.
1595        @returns {str} The value for the header tag's "id" attribute. Return
1596            None to not have an id attribute and to exclude this header from
1597            the TOC (if the "toc" extra is specified).
1598        """
1599        header_id = _slugify(text)
1600        if prefix and isinstance(prefix, base_string_type):
1601            header_id = prefix + '-' + header_id
1602
1603        self._count_from_header_id[header_id] += 1
1604        if 0 == len(header_id) or self._count_from_header_id[header_id] > 1:
1605            header_id += '-%s' % self._count_from_header_id[header_id]
1606
1607        return header_id
1608
1609    def _toc_add_entry(self, level, id, name):
1610        if level > self._toc_depth:
1611            return
1612        if self._toc is None:
1613            self._toc = []
1614        self._toc.append((level, id, self._unescape_special_chars(name)))
1615
1616    _h_re_base = r'''
1617        (^(.+)[ \t]{0,99}\n(=+|-+)[ \t]*\n+)
1618        |
1619        (^(\#{1,6})  # \1 = string of #'s
1620        [ \t]%s
1621        (.+?)       # \2 = Header text
1622        [ \t]{0,99}
1623        (?<!\\)     # ensure not an escaped trailing '#'
1624        \#*         # optional closing #'s (not counted)
1625        \n+
1626        )
1627        '''
1628
1629    _h_re = re.compile(_h_re_base % '*', re.X | re.M)
1630    _h_re_tag_friendly = re.compile(_h_re_base % '+', re.X | re.M)
1631
1632    def _h_sub(self, match):
1633        if match.group(1) is not None and match.group(3) == "-":
1634            return match.group(1)
1635        elif match.group(1) is not None:
1636            # Setext header
1637            n = {"=": 1, "-": 2}[match.group(3)[0]]
1638            header_group = match.group(2)
1639        else:
1640            # atx header
1641            n = len(match.group(5))
1642            header_group = match.group(6)
1643
1644        demote_headers = self.extras.get("demote-headers")
1645        if demote_headers:
1646            n = min(n + demote_headers, 6)
1647        header_id_attr = ""
1648        if "header-ids" in self.extras:
1649            header_id = self.header_id_from_text(header_group,
1650                self.extras["header-ids"], n)
1651            if header_id:
1652                header_id_attr = ' id="%s"' % header_id
1653        html = self._run_span_gamut(header_group)
1654        if "toc" in self.extras and header_id:
1655            self._toc_add_entry(n, header_id, html)
1656        return "<h%d%s>%s</h%d>\n\n" % (n, header_id_attr, html, n)
1657
1658    def _do_headers(self, text):
1659        # Setext-style headers:
1660        #     Header 1
1661        #     ========
1662        #
1663        #     Header 2
1664        #     --------
1665
1666        # atx-style headers:
1667        #   # Header 1
1668        #   ## Header 2
1669        #   ## Header 2 with closing hashes ##
1670        #   ...
1671        #   ###### Header 6
1672
1673        if 'tag-friendly' in self.extras:
1674            return self._h_re_tag_friendly.sub(self._h_sub, text)
1675        return self._h_re.sub(self._h_sub, text)
1676
1677    _marker_ul_chars = '*+-'
1678    _marker_any = r'(?:[%s]|\d+\.)' % _marker_ul_chars
1679    _marker_ul = '(?:[%s])' % _marker_ul_chars
1680    _marker_ol = r'(?:\d+\.)'
1681
1682    def _list_sub(self, match):
1683        lst = match.group(1)
1684        lst_type = match.group(3) in self._marker_ul_chars and "ul" or "ol"
1685        result = self._process_list_items(lst)
1686        if self.list_level:
1687            return "<%s>\n%s</%s>\n" % (lst_type, result, lst_type)
1688        else:
1689            return "<%s>\n%s</%s>\n\n" % (lst_type, result, lst_type)
1690
1691    def _do_lists(self, text):
1692        # Form HTML ordered (numbered) and unordered (bulleted) lists.
1693
1694        # Iterate over each *non-overlapping* list match.
1695        pos = 0
1696        while True:
1697            # Find the *first* hit for either list style (ul or ol). We
1698            # match ul and ol separately to avoid adjacent lists of different
1699            # types running into each other (see issue #16).
1700            hits = []
1701            for marker_pat in (self._marker_ul, self._marker_ol):
1702                less_than_tab = self.tab_width - 1
1703                whole_list = r'''
1704                    (                   # \1 = whole list
1705                      (                 # \2
1706                        [ ]{0,%d}
1707                        (%s)            # \3 = first list item marker
1708                        [ \t]+
1709                        (?!\ *\3\ )     # '- - - ...' isn't a list. See 'not_quite_a_list' test case.
1710                      )
1711                      (?:.+?)
1712                      (                 # \4
1713                          \Z
1714                        |
1715                          \n{2,}
1716                          (?=\S)
1717                          (?!           # Negative lookahead for another list item marker
1718                            [ \t]*
1719                            %s[ \t]+
1720                          )
1721                      )
1722                    )
1723                ''' % (less_than_tab, marker_pat, marker_pat)
1724                if self.list_level:  # sub-list
1725                    list_re = re.compile("^"+whole_list, re.X | re.M | re.S)
1726                else:
1727                    list_re = re.compile(r"(?:(?<=\n\n)|\A\n?)"+whole_list,
1728                                         re.X | re.M | re.S)
1729                match = list_re.search(text, pos)
1730                if match:
1731                    hits.append((match.start(), match))
1732            if not hits:
1733                break
1734            hits.sort()
1735            match = hits[0][1]
1736            start, end = match.span()
1737            middle = self._list_sub(match)
1738            text = text[:start] + middle + text[end:]
1739            pos = start + len(middle)  # start pos for next attempted match
1740
1741        return text
1742
1743    _list_item_re = re.compile(r'''
1744        (\n)?                   # leading line = \1
1745        (^[ \t]*)               # leading whitespace = \2
1746        (?P<marker>%s) [ \t]+   # list marker = \3
1747        ((?:.+?)                # list item text = \4
1748        (\n{1,2}))              # eols = \5
1749        (?= \n* (\Z | \2 (?P<next_marker>%s) [ \t]+))
1750        ''' % (_marker_any, _marker_any),
1751        re.M | re.X | re.S)
1752
1753    _task_list_item_re = re.compile(r'''
1754        (\[[\ xX]\])[ \t]+       # tasklist marker = \1
1755        (.*)                   # list item text = \2
1756    ''', re.M | re.X | re.S)
1757
1758    _task_list_warpper_str = r'<input type="checkbox" class="task-list-item-checkbox" %sdisabled> %s'
1759
1760    def _task_list_item_sub(self, match):
1761        marker = match.group(1)
1762        item_text = match.group(2)
1763        if marker in ['[x]','[X]']:
1764                return self._task_list_warpper_str % ('checked ', item_text)
1765        elif marker == '[ ]':
1766                return self._task_list_warpper_str % ('', item_text)
1767
1768    _last_li_endswith_two_eols = False
1769    def _list_item_sub(self, match):
1770        item = match.group(4)
1771        leading_line = match.group(1)
1772        if leading_line or "\n\n" in item or self._last_li_endswith_two_eols:
1773            item = self._run_block_gamut(self._outdent(item))
1774        else:
1775            # Recursion for sub-lists:
1776            item = self._do_lists(self._outdent(item))
1777            if item.endswith('\n'):
1778                item = item[:-1]
1779            item = self._run_span_gamut(item)
1780        self._last_li_endswith_two_eols = (len(match.group(5)) == 2)
1781
1782        if "task_list" in self.extras:
1783            item = self._task_list_item_re.sub(self._task_list_item_sub, item)
1784
1785        return "<li>%s</li>\n" % item
1786
1787    def _process_list_items(self, list_str):
1788        # Process the contents of a single ordered or unordered list,
1789        # splitting it into individual list items.
1790
1791        # The $g_list_level global keeps track of when we're inside a list.
1792        # Each time we enter a list, we increment it; when we leave a list,
1793        # we decrement. If it's zero, we're not in a list anymore.
1794        #
1795        # We do this because when we're not inside a list, we want to treat
1796        # something like this:
1797        #
1798        #       I recommend upgrading to version
1799        #       8. Oops, now this line is treated
1800        #       as a sub-list.
1801        #
1802        # As a single paragraph, despite the fact that the second line starts
1803        # with a digit-period-space sequence.
1804        #
1805        # Whereas when we're inside a list (or sub-list), that line will be
1806        # treated as the start of a sub-list. What a kludge, huh? This is
1807        # an aspect of Markdown's syntax that's hard to parse perfectly
1808        # without resorting to mind-reading. Perhaps the solution is to
1809        # change the syntax rules such that sub-lists must start with a
1810        # starting cardinal number; e.g. "1." or "a.".
1811        self.list_level += 1
1812        self._last_li_endswith_two_eols = False
1813        list_str = list_str.rstrip('\n') + '\n'
1814        list_str = self._list_item_re.sub(self._list_item_sub, list_str)
1815        self.list_level -= 1
1816        return list_str
1817
1818    def _get_pygments_lexer(self, lexer_name):
1819        try:
1820            from pygments import lexers, util
1821        except ImportError:
1822            return None
1823        try:
1824            return lexers.get_lexer_by_name(lexer_name)
1825        except util.ClassNotFound:
1826            return None
1827
1828    def _color_with_pygments(self, codeblock, lexer, **formatter_opts):
1829        import pygments
1830        import pygments.formatters
1831
1832        class HtmlCodeFormatter(pygments.formatters.HtmlFormatter):
1833            def _wrap_code(self, inner):
1834                """A function for use in a Pygments Formatter which
1835                wraps in <code> tags.
1836                """
1837                yield 0, "<code>"
1838                for tup in inner:
1839                    yield tup
1840                yield 0, "</code>"
1841
1842            def wrap(self, source, outfile=None):
1843                """Return the source with a code, pre, and div."""
1844                if outfile is None:
1845                    # pygments >= 2.12
1846                    return self._wrap_pre(self._wrap_code(source))
1847                else:
1848                    # pygments < 2.12
1849                    return self._wrap_div(self._wrap_pre(self._wrap_code(source)))
1850
1851        formatter_opts.setdefault("cssclass", "codehilite")
1852        formatter = HtmlCodeFormatter(**formatter_opts)
1853        return pygments.highlight(codeblock, lexer, formatter)
1854
1855    def _code_block_sub(self, match, is_fenced_code_block=False):
1856        lexer_name = None
1857        if is_fenced_code_block:
1858            lexer_name = match.group(2)
1859            if lexer_name:
1860                formatter_opts = self.extras['fenced-code-blocks'] or {}
1861            codeblock = match.group(3)
1862            codeblock = codeblock[:-1]  # drop one trailing newline
1863        else:
1864            codeblock = match.group(1)
1865            codeblock = self._outdent(codeblock)
1866            codeblock = self._detab(codeblock)
1867            codeblock = codeblock.lstrip('\n')  # trim leading newlines
1868            codeblock = codeblock.rstrip()      # trim trailing whitespace
1869
1870            # Note: "code-color" extra is DEPRECATED.
1871            if "code-color" in self.extras and codeblock.startswith(":::"):
1872                lexer_name, rest = codeblock.split('\n', 1)
1873                lexer_name = lexer_name[3:].strip()
1874                codeblock = rest.lstrip("\n")   # Remove lexer declaration line.
1875                formatter_opts = self.extras['code-color'] or {}
1876
1877        # Use pygments only if not using the highlightjs-lang extra
1878        if lexer_name and "highlightjs-lang" not in self.extras:
1879            def unhash_code(codeblock):
1880                for key, sanitized in list(self.html_spans.items()):
1881                    codeblock = codeblock.replace(key, sanitized)
1882                replacements = [
1883                    ("&amp;", "&"),
1884                    ("&lt;", "<"),
1885                    ("&gt;", ">")
1886                ]
1887                for old, new in replacements:
1888                    codeblock = codeblock.replace(old, new)
1889                return codeblock
1890            lexer = self._get_pygments_lexer(lexer_name)
1891            if lexer:
1892                codeblock = unhash_code( codeblock )
1893                colored = self._color_with_pygments(codeblock, lexer,
1894                                                    **formatter_opts)
1895                return "\n\n%s\n\n" % colored
1896
1897        codeblock = self._encode_code(codeblock)
1898        pre_class_str = self._html_class_str_from_tag("pre")
1899
1900        if "highlightjs-lang" in self.extras and lexer_name:
1901            code_class_str = ' class="%s language-%s"' % (lexer_name, lexer_name)
1902        else:
1903            code_class_str = self._html_class_str_from_tag("code")
1904
1905        return "\n\n<pre%s><code%s>%s\n</code></pre>\n\n" % (
1906            pre_class_str, code_class_str, codeblock)
1907
1908    def _html_class_str_from_tag(self, tag):
1909        """Get the appropriate ' class="..."' string (note the leading
1910        space), if any, for the given tag.
1911        """
1912        if "html-classes" not in self.extras:
1913            return ""
1914        try:
1915            html_classes_from_tag = self.extras["html-classes"]
1916        except TypeError:
1917            return ""
1918        else:
1919            if tag in html_classes_from_tag:
1920                return ' class="%s"' % html_classes_from_tag[tag]
1921        return ""
1922
1923    def _do_code_blocks(self, text):
1924        """Process Markdown `<pre><code>` blocks."""
1925        code_block_re = re.compile(r'''
1926            (?:\n\n|\A\n?)
1927            (               # $1 = the code block -- one or more lines, starting with a space/tab
1928              (?:
1929                (?:[ ]{%d} | \t)  # Lines must start with a tab or a tab-width of spaces
1930                .*\n+
1931              )+
1932            )
1933            ((?=^[ ]{0,%d}\S)|\Z)   # Lookahead for non-space at line-start, or end of doc
1934            # Lookahead to make sure this block isn't already in a code block.
1935            # Needed when syntax highlighting is being used.
1936            (?!([^<]|<(/?)span)*\</code\>)
1937            ''' % (self.tab_width, self.tab_width),
1938            re.M | re.X)
1939        return code_block_re.sub(self._code_block_sub, text)
1940
1941    _fenced_code_block_re = re.compile(r'''
1942        (?:\n+|\A\n?|(?<=\n))
1943        (^`{3,})\s{0,99}?([\w+-]+)?\s{0,99}?\n  # $1 = opening fence (captured for back-referencing), $2 = optional lang
1944        (.*?)                             # $3 = code block content
1945        \1[ \t]*\n                      # closing fence
1946        ''', re.M | re.X | re.S)
1947
1948    def _fenced_code_block_sub(self, match):
1949        return self._code_block_sub(match, is_fenced_code_block=True)
1950
1951    def _do_fenced_code_blocks(self, text):
1952        """Process ```-fenced unindented code blocks ('fenced-code-blocks' extra)."""
1953        return self._fenced_code_block_re.sub(self._fenced_code_block_sub, text)
1954
1955    # Rules for a code span:
1956    # - backslash escapes are not interpreted in a code span
1957    # - to include one or or a run of more backticks the delimiters must
1958    #   be a longer run of backticks
1959    # - cannot start or end a code span with a backtick; pad with a
1960    #   space and that space will be removed in the emitted HTML
1961    # See `test/tm-cases/escapes.text` for a number of edge-case
1962    # examples.
1963    _code_span_re = re.compile(r'''
1964            (?<!\\)
1965            (`+)        # \1 = Opening run of `
1966            (?!`)       # See Note A test/tm-cases/escapes.text
1967            (.+?)       # \2 = The code block
1968            (?<!`)
1969            \1          # Matching closer
1970            (?!`)
1971        ''', re.X | re.S)
1972
1973    def _code_span_sub(self, match):
1974        c = match.group(2).strip(" \t")
1975        c = self._encode_code(c)
1976        return "<code>%s</code>" % c
1977
1978    def _do_code_spans(self, text):
1979        #   *   Backtick quotes are used for <code></code> spans.
1980        #
1981        #   *   You can use multiple backticks as the delimiters if you want to
1982        #       include literal backticks in the code span. So, this input:
1983        #
1984        #         Just type ``foo `bar` baz`` at the prompt.
1985        #
1986        #       Will translate to:
1987        #
1988        #         <p>Just type <code>foo `bar` baz</code> at the prompt.</p>
1989        #
1990        #       There's no arbitrary limit to the number of backticks you
1991        #       can use as delimters. If you need three consecutive backticks
1992        #       in your code, use four for delimiters, etc.
1993        #
1994        #   *   You can use spaces to get literal backticks at the edges:
1995        #
1996        #         ... type `` `bar` `` ...
1997        #
1998        #       Turns to:
1999        #
2000        #         ... type <code>`bar`</code> ...
2001        return self._code_span_re.sub(self._code_span_sub, text)
2002
2003    def _encode_code(self, text):
2004        """Encode/escape certain characters inside Markdown code runs.
2005        The point is that in code, these characters are literals,
2006        and lose their special Markdown meanings.
2007        """
2008        replacements = [
2009            # Encode all ampersands; HTML entities are not
2010            # entities within a Markdown code span.
2011            ('&', '&amp;'),
2012            # Do the angle bracket song and dance:
2013            ('<', '&lt;'),
2014            ('>', '&gt;'),
2015        ]
2016        for before, after in replacements:
2017            text = text.replace(before, after)
2018        hashed = _hash_text(text)
2019        self._code_table[text] = hashed
2020        return hashed
2021
2022    _strike_re = re.compile(r"~~(?=\S)(.+?)(?<=\S)~~", re.S)
2023    def _do_strike(self, text):
2024        text = self._strike_re.sub(r"<strike>\1</strike>", text)
2025        return text
2026
2027    _underline_re = re.compile(r"--(?=\S)(.+?)(?<=\S)--", re.S)
2028    def _do_underline(self, text):
2029        text = self._underline_re.sub(r"<u>\1</u>", text)
2030        return text
2031
2032    _strong_re = re.compile(r"(\*\*|__)(?=\S)(.+?[*_]*)(?<=\S)\1", re.S)
2033    _em_re = re.compile(r"(\*|_)(?=\S)(.+?)(?<=\S)\1", re.S)
2034    _code_friendly_strong_re = re.compile(r"\*\*(?=\S)(.+?[*_]*)(?<=\S)\*\*", re.S)
2035    _code_friendly_em_re = re.compile(r"\*(?=\S)(.+?)(?<=\S)\*", re.S)
2036    def _do_italics_and_bold(self, text):
2037        # <strong> must go first:
2038        if "code-friendly" in self.extras:
2039            text = self._code_friendly_strong_re.sub(r"<strong>\1</strong>", text)
2040            text = self._code_friendly_em_re.sub(r"<em>\1</em>", text)
2041        else:
2042            text = self._strong_re.sub(r"<strong>\2</strong>", text)
2043            text = self._em_re.sub(r"<em>\2</em>", text)
2044        return text
2045
2046    # "smarty-pants" extra: Very liberal in interpreting a single prime as an
2047    # apostrophe; e.g. ignores the fact that "round", "bout", "twer", and
2048    # "twixt" can be written without an initial apostrophe. This is fine because
2049    # using scare quotes (single quotation marks) is rare.
2050    _apostrophe_year_re = re.compile(r"'(\d\d)(?=(\s|,|;|\.|\?|!|$))")
2051    _contractions = ["tis", "twas", "twer", "neath", "o", "n",
2052        "round", "bout", "twixt", "nuff", "fraid", "sup"]
2053    def _do_smart_contractions(self, text):
2054        text = self._apostrophe_year_re.sub(r"&#8217;\1", text)
2055        for c in self._contractions:
2056            text = text.replace("'%s" % c, "&#8217;%s" % c)
2057            text = text.replace("'%s" % c.capitalize(),
2058                "&#8217;%s" % c.capitalize())
2059        return text
2060
2061    # Substitute double-quotes before single-quotes.
2062    _opening_single_quote_re = re.compile(r"(?<!\S)'(?=\S)")
2063    _opening_double_quote_re = re.compile(r'(?<!\S)"(?=\S)')
2064    _closing_single_quote_re = re.compile(r"(?<=\S)'")
2065    _closing_double_quote_re = re.compile(r'(?<=\S)"(?=(\s|,|;|\.|\?|!|$))')
2066    def _do_smart_punctuation(self, text):
2067        """Fancifies 'single quotes', "double quotes", and apostrophes.
2068        Converts --, ---, and ... into en dashes, em dashes, and ellipses.
2069
2070        Inspiration is: <http://daringfireball.net/projects/smartypants/>
2071        See "test/tm-cases/smarty_pants.text" for a full discussion of the
2072        support here and
2073        <http://code.google.com/p/python-markdown2/issues/detail?id=42> for a
2074        discussion of some diversion from the original SmartyPants.
2075        """
2076        if "'" in text:  # guard for perf
2077            text = self._do_smart_contractions(text)
2078            text = self._opening_single_quote_re.sub("&#8216;", text)
2079            text = self._closing_single_quote_re.sub("&#8217;", text)
2080
2081        if '"' in text:  # guard for perf
2082            text = self._opening_double_quote_re.sub("&#8220;", text)
2083            text = self._closing_double_quote_re.sub("&#8221;", text)
2084
2085        text = text.replace("---", "&#8212;")
2086        text = text.replace("--", "&#8211;")
2087        text = text.replace("...", "&#8230;")
2088        text = text.replace(" . . . ", "&#8230;")
2089        text = text.replace(". . .", "&#8230;")
2090
2091        # TODO: Temporary hack to fix https://github.com/trentm/python-markdown2/issues/150
2092        if "footnotes" in self.extras and "footnote-ref" in text:
2093            # Quotes in the footnote back ref get converted to "smart" quotes
2094            # Change them back here to ensure they work.
2095            text = text.replace('class="footnote-ref&#8221;', 'class="footnote-ref"')
2096
2097        return text
2098
2099    _block_quote_base = r'''
2100        (                           # Wrap whole match in \1
2101          (
2102            ^[ \t]*>%s[ \t]?        # '>' at the start of a line
2103              .+\n                  # rest of the first line
2104            (.+\n)*                 # subsequent consecutive lines
2105          )+
2106        )
2107    '''
2108    _block_quote_re = re.compile(_block_quote_base % '', re.M | re.X)
2109    _block_quote_re_spoiler = re.compile(_block_quote_base % '[ \t]*?!?', re.M | re.X)
2110    _bq_one_level_re = re.compile('^[ \t]*>[ \t]?', re.M)
2111    _bq_one_level_re_spoiler = re.compile('^[ \t]*>[ \t]*?![ \t]?', re.M)
2112    _bq_all_lines_spoilers = re.compile(r'\A(?:^[ \t]*>[ \t]*?!.*[\n\r]*)+\Z', re.M)
2113    _html_pre_block_re = re.compile(r'(\s*<pre>.+?</pre>)', re.S)
2114    def _dedent_two_spaces_sub(self, match):
2115        return re.sub(r'(?m)^  ', '', match.group(1))
2116
2117    def _block_quote_sub(self, match):
2118        bq = match.group(1)
2119        is_spoiler = 'spoiler' in self.extras and self._bq_all_lines_spoilers.match(bq)
2120        # trim one level of quoting
2121        if is_spoiler:
2122            bq = self._bq_one_level_re_spoiler.sub('', bq)
2123        else:
2124            bq = self._bq_one_level_re.sub('', bq)
2125        # trim whitespace-only lines
2126        bq = self._ws_only_line_re.sub('', bq)
2127        bq = self._run_block_gamut(bq)          # recurse
2128
2129        bq = re.sub('(?m)^', '  ', bq)
2130        # These leading spaces screw with <pre> content, so we need to fix that:
2131        bq = self._html_pre_block_re.sub(self._dedent_two_spaces_sub, bq)
2132
2133        if is_spoiler:
2134            return '<blockquote class="spoiler">\n%s\n</blockquote>\n\n' % bq
2135        else:
2136            return '<blockquote>\n%s\n</blockquote>\n\n' % bq
2137
2138    def _do_block_quotes(self, text):
2139        if '>' not in text:
2140            return text
2141        if 'spoiler' in self.extras:
2142            return self._block_quote_re_spoiler.sub(self._block_quote_sub, text)
2143        else:
2144            return self._block_quote_re.sub(self._block_quote_sub, text)
2145
2146    def _form_paragraphs(self, text):
2147        # Strip leading and trailing lines:
2148        text = text.strip('\n')
2149
2150        # Wrap <p> tags.
2151        grafs = []
2152        for i, graf in enumerate(re.split(r"\n{2,}", text)):
2153            if graf in self.html_blocks:
2154                # Unhashify HTML blocks
2155                grafs.append(self.html_blocks[graf])
2156            else:
2157                cuddled_list = None
2158                if "cuddled-lists" in self.extras:
2159                    # Need to put back trailing '\n' for `_list_item_re`
2160                    # match at the end of the paragraph.
2161                    li = self._list_item_re.search(graf + '\n')
2162                    # Two of the same list marker in this paragraph: a likely
2163                    # candidate for a list cuddled to preceding paragraph
2164                    # text (issue 33). Note the `[-1]` is a quick way to
2165                    # consider numeric bullets (e.g. "1." and "2.") to be
2166                    # equal.
2167                    if (li and len(li.group(2)) <= 3
2168                            and (
2169                                    (li.group("next_marker") and li.group("marker")[-1] == li.group("next_marker")[-1])
2170                                    or
2171                                    li.group("next_marker") is None
2172                            )
2173                    ):
2174                        start = li.start()
2175                        cuddled_list = self._do_lists(graf[start:]).rstrip("\n")
2176                        assert cuddled_list.startswith("<ul>") or cuddled_list.startswith("<ol>")
2177                        graf = graf[:start]
2178
2179                # Wrap <p> tags.
2180                graf = self._run_span_gamut(graf)
2181                grafs.append("<p%s>" % self._html_class_str_from_tag('p') + graf.lstrip(" \t") + "</p>")
2182
2183                if cuddled_list:
2184                    grafs.append(cuddled_list)
2185
2186        return "\n\n".join(grafs)
2187
2188    def _add_footnotes(self, text):
2189        if self.footnotes:
2190            footer = [
2191                '<div class="footnotes">',
2192                '<hr' + self.empty_element_suffix,
2193                '<ol>',
2194            ]
2195
2196            if not self.footnote_title:
2197                self.footnote_title = "Jump back to footnote %d in the text."
2198            if not self.footnote_return_symbol:
2199                self.footnote_return_symbol = "&#8617;"
2200
2201            for i, id in enumerate(self.footnote_ids):
2202                if i != 0:
2203                    footer.append('')
2204                footer.append('<li id="fn-%s">' % id)
2205                footer.append(self._run_block_gamut(self.footnotes[id]))
2206                try:
2207                    backlink = ('<a href="#fnref-%s" ' +
2208                            'class="footnoteBackLink" ' +
2209                            'title="' + self.footnote_title + '">' +
2210                            self.footnote_return_symbol +
2211                            '</a>') % (id, i+1)
2212                except TypeError:
2213                    log.debug("Footnote error. `footnote_title` "
2214                              "must include parameter. Using defaults.")
2215                    backlink = ('<a href="#fnref-%s" '
2216                        'class="footnoteBackLink" '
2217                        'title="Jump back to footnote %d in the text.">'
2218                        '&#8617;</a>' % (id, i+1))
2219
2220                if footer[-1].endswith("</p>"):
2221                    footer[-1] = footer[-1][:-len("</p>")] \
2222                        + '&#160;' + backlink + "</p>"
2223                else:
2224                    footer.append("\n<p>%s</p>" % backlink)
2225                footer.append('</li>')
2226            footer.append('</ol>')
2227            footer.append('</div>')
2228            return text + '\n\n' + '\n'.join(footer)
2229        else:
2230            return text
2231
2232    _naked_lt_re = re.compile(r'<(?![a-z/?\$!])', re.I)
2233    _naked_gt_re = re.compile(r'''(?<![a-z0-9?!/'"-])>''', re.I)
2234
2235    def _encode_amps_and_angles(self, text):
2236        # Smart processing for ampersands and angle brackets that need
2237        # to be encoded.
2238        text = _AMPERSAND_RE.sub('&amp;', text)
2239
2240        # Encode naked <'s
2241        text = self._naked_lt_re.sub('&lt;', text)
2242
2243        # Encode naked >'s
2244        # Note: Other markdown implementations (e.g. Markdown.pl, PHP
2245        # Markdown) don't do this.
2246        text = self._naked_gt_re.sub('&gt;', text)
2247        return text
2248
2249    _incomplete_tags_re = re.compile(r"<(/?\w+?(?!\w).+?[\s/]+?)")
2250
2251    def _encode_incomplete_tags(self, text):
2252        if self.safe_mode not in ("replace", "escape"):
2253            return text
2254
2255        if text.endswith(">"):
2256            return text  # this is not an incomplete tag, this is a link in the form <http://x.y.z>
2257
2258        return self._incomplete_tags_re.sub("&lt;\\1", text)
2259
2260    def _encode_backslash_escapes(self, text):
2261        for ch, escape in list(self._escape_table.items()):
2262            text = text.replace("\\"+ch, escape)
2263        return text
2264
2265    _auto_link_re = re.compile(r'<((https?|ftp):[^\'">\s]+)>', re.I)
2266    def _auto_link_sub(self, match):
2267        g1 = match.group(1)
2268        return '<a href="%s">%s</a>' % (g1, g1)
2269
2270    _auto_email_link_re = re.compile(r"""
2271          <
2272           (?:mailto:)?
2273          (
2274              [-.\w]+
2275              \@
2276              [-\w]+(\.[-\w]+)*\.[a-z]+
2277          )
2278          >
2279        """, re.I | re.X | re.U)
2280    def _auto_email_link_sub(self, match):
2281        return self._encode_email_address(
2282            self._unescape_special_chars(match.group(1)))
2283
2284    def _do_auto_links(self, text):
2285        text = self._auto_link_re.sub(self._auto_link_sub, text)
2286        text = self._auto_email_link_re.sub(self._auto_email_link_sub, text)
2287        return text
2288
2289    def _encode_email_address(self, addr):
2290        #  Input: an email address, e.g. "foo@example.com"
2291        #
2292        #  Output: the email address as a mailto link, with each character
2293        #      of the address encoded as either a decimal or hex entity, in
2294        #      the hopes of foiling most address harvesting spam bots. E.g.:
2295        #
2296        #    <a href="&#x6D;&#97;&#105;&#108;&#x74;&#111;:&#102;&#111;&#111;&#64;&#101;
2297        #       x&#x61;&#109;&#x70;&#108;&#x65;&#x2E;&#99;&#111;&#109;">&#102;&#111;&#111;
2298        #       &#64;&#101;x&#x61;&#109;&#x70;&#108;&#x65;&#x2E;&#99;&#111;&#109;</a>
2299        #
2300        #  Based on a filter by Matthew Wickline, posted to the BBEdit-Talk
2301        #  mailing list: <http://tinyurl.com/yu7ue>
2302        chars = [_xml_encode_email_char_at_random(ch)
2303                 for ch in "mailto:" + addr]
2304        # Strip the mailto: from the visible part.
2305        addr = '<a href="%s">%s</a>' \
2306               % (''.join(chars), ''.join(chars[7:]))
2307        return addr
2308
2309    def _do_link_patterns(self, text):
2310        link_from_hash = {}
2311        for regex, repl in self.link_patterns:
2312            replacements = []
2313            for match in regex.finditer(text):
2314                if hasattr(repl, "__call__"):
2315                    href = repl(match)
2316                else:
2317                    href = match.expand(repl)
2318                replacements.append((match.span(), href))
2319            for (start, end), href in reversed(replacements):
2320
2321                # Do not match against links inside brackets.
2322                if text[start - 1:start] == '[' and text[end:end + 1] == ']':
2323                    continue
2324
2325                # Do not match against links in the standard markdown syntax.
2326                if text[start - 2:start] == '](' or text[end:end + 2] == '")':
2327                    continue
2328
2329                # Do not match against links which are escaped.
2330                if text[start - 3:start] == '"""' and text[end:end + 3] == '"""':
2331                    text = text[:start - 3] + text[start:end] + text[end + 3:]
2332                    continue
2333
2334                escaped_href = (
2335                    href.replace('"', '&quot;')  # b/c of attr quote
2336                        # To avoid markdown <em> and <strong>:
2337                        .replace('*', self._escape_table['*'])
2338                        .replace('_', self._escape_table['_']))
2339                link = '<a href="%s">%s</a>' % (escaped_href, text[start:end])
2340                hash = _hash_text(link)
2341                link_from_hash[hash] = link
2342                text = text[:start] + hash + text[end:]
2343        for hash, link in list(link_from_hash.items()):
2344            text = text.replace(hash, link)
2345        return text
2346
2347    def _unescape_special_chars(self, text):
2348        # Swap back in all the special characters we've hidden.
2349        for ch, hash in list(self._escape_table.items()) + list(self._code_table.items()):
2350            text = text.replace(hash, ch)
2351        return text
2352
2353    def _outdent(self, text):
2354        # Remove one level of line-leading tabs or spaces
2355        return self._outdent_re.sub('', text)
2356
2357
2358class MarkdownWithExtras(Markdown):
2359    """A markdowner class that enables most extras:
2360
2361    - footnotes
2362    - code-color (only has effect if 'pygments' Python module on path)
2363
2364    These are not included:
2365    - pyshell (specific to Python-related documenting)
2366    - code-friendly (because it *disables* part of the syntax)
2367    - link-patterns (because you need to specify some actual
2368      link-patterns anyway)
2369    """
2370    extras = ["footnotes", "code-color"]
2371
2372
2373# ---- internal support functions
2374
2375
2376def calculate_toc_html(toc):
2377    """Return the HTML for the current TOC.
2378
2379    This expects the `_toc` attribute to have been set on this instance.
2380    """
2381    if toc is None:
2382        return None
2383
2384    def indent():
2385        return '  ' * (len(h_stack) - 1)
2386    lines = []
2387    h_stack = [0]   # stack of header-level numbers
2388    for level, id, name in toc:
2389        if level > h_stack[-1]:
2390            lines.append("%s<ul>" % indent())
2391            h_stack.append(level)
2392        elif level == h_stack[-1]:
2393            lines[-1] += "</li>"
2394        else:
2395            while level < h_stack[-1]:
2396                h_stack.pop()
2397                if not lines[-1].endswith("</li>"):
2398                    lines[-1] += "</li>"
2399                lines.append("%s</ul></li>" % indent())
2400        lines.append('%s<li><a href="#%s">%s</a>' % (
2401            indent(), id, name))
2402    while len(h_stack) > 1:
2403        h_stack.pop()
2404        if not lines[-1].endswith("</li>"):
2405            lines[-1] += "</li>"
2406        lines.append("%s</ul>" % indent())
2407    return '\n'.join(lines) + '\n'
2408
2409
2410class UnicodeWithAttrs(unicode):
2411    """A subclass of unicode used for the return value of conversion to
2412    possibly attach some attributes. E.g. the "toc_html" attribute when
2413    the "toc" extra is used.
2414    """
2415    metadata = None
2416    toc_html = None
2417
2418## {{{ http://code.activestate.com/recipes/577257/ (r1)
2419_slugify_strip_re = re.compile(r'[^\w\s-]')
2420_slugify_hyphenate_re = re.compile(r'[-\s]+')
2421def _slugify(value):
2422    """
2423    Normalizes string, converts to lowercase, removes non-alpha characters,
2424    and converts spaces to hyphens.
2425
2426    From Django's "django/template/defaultfilters.py".
2427    """
2428    import unicodedata
2429    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode()
2430    value = _slugify_strip_re.sub('', value).strip().lower()
2431    return _slugify_hyphenate_re.sub('-', value)
2432## end of http://code.activestate.com/recipes/577257/ }}}
2433
2434
2435# From http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52549
2436def _curry(*args, **kwargs):
2437    function, args = args[0], args[1:]
2438    def result(*rest, **kwrest):
2439        combined = kwargs.copy()
2440        combined.update(kwrest)
2441        return function(*args + rest, **combined)
2442    return result
2443
2444
2445# Recipe: regex_from_encoded_pattern (1.0)
2446def _regex_from_encoded_pattern(s):
2447    """'foo'    -> re.compile(re.escape('foo'))
2448       '/foo/'  -> re.compile('foo')
2449       '/foo/i' -> re.compile('foo', re.I)
2450    """
2451    if s.startswith('/') and s.rfind('/') != 0:
2452        # Parse it: /PATTERN/FLAGS
2453        idx = s.rfind('/')
2454        _, flags_str = s[1:idx], s[idx+1:]
2455        flag_from_char = {
2456            "i": re.IGNORECASE,
2457            "l": re.LOCALE,
2458            "s": re.DOTALL,
2459            "m": re.MULTILINE,
2460            "u": re.UNICODE,
2461        }
2462        flags = 0
2463        for char in flags_str:
2464            try:
2465                flags |= flag_from_char[char]
2466            except KeyError:
2467                raise ValueError("unsupported regex flag: '%s' in '%s' "
2468                                 "(must be one of '%s')"
2469                                 % (char, s, ''.join(list(flag_from_char.keys()))))
2470        return re.compile(s[1:idx], flags)
2471    else:  # not an encoded regex
2472        return re.compile(re.escape(s))
2473
2474
2475# Recipe: dedent (0.1.2)
2476def _dedentlines(lines, tabsize=8, skip_first_line=False):
2477    """_dedentlines(lines, tabsize=8, skip_first_line=False) -> dedented lines
2478
2479        "lines" is a list of lines to dedent.
2480        "tabsize" is the tab width to use for indent width calculations.
2481        "skip_first_line" is a boolean indicating if the first line should
2482            be skipped for calculating the indent width and for dedenting.
2483            This is sometimes useful for docstrings and similar.
2484
2485    Same as dedent() except operates on a sequence of lines. Note: the
2486    lines list is modified **in-place**.
2487    """
2488    DEBUG = False
2489    if DEBUG:
2490        print("dedent: dedent(..., tabsize=%d, skip_first_line=%r)"\
2491              % (tabsize, skip_first_line))
2492    margin = None
2493    for i, line in enumerate(lines):
2494        if i == 0 and skip_first_line: continue
2495        indent = 0
2496        for ch in line:
2497            if ch == ' ':
2498                indent += 1
2499            elif ch == '\t':
2500                indent += tabsize - (indent % tabsize)
2501            elif ch in '\r\n':
2502                continue  # skip all-whitespace lines
2503            else:
2504                break
2505        else:
2506            continue  # skip all-whitespace lines
2507        if DEBUG: print("dedent: indent=%d: %r" % (indent, line))
2508        if margin is None:
2509            margin = indent
2510        else:
2511            margin = min(margin, indent)
2512    if DEBUG: print("dedent: margin=%r" % margin)
2513
2514    if margin is not None and margin > 0:
2515        for i, line in enumerate(lines):
2516            if i == 0 and skip_first_line: continue
2517            removed = 0
2518            for j, ch in enumerate(line):
2519                if ch == ' ':
2520                    removed += 1
2521                elif ch == '\t':
2522                    removed += tabsize - (removed % tabsize)
2523                elif ch in '\r\n':
2524                    if DEBUG: print("dedent: %r: EOL -> strip up to EOL" % line)
2525                    lines[i] = lines[i][j:]
2526                    break
2527                else:
2528                    raise ValueError("unexpected non-whitespace char %r in "
2529                                     "line %r while removing %d-space margin"
2530                                     % (ch, line, margin))
2531                if DEBUG:
2532                    print("dedent: %r: %r -> removed %d/%d"\
2533                          % (line, ch, removed, margin))
2534                if removed == margin:
2535                    lines[i] = lines[i][j+1:]
2536                    break
2537                elif removed > margin:
2538                    lines[i] = ' '*(removed-margin) + lines[i][j+1:]
2539                    break
2540            else:
2541                if removed:
2542                    lines[i] = lines[i][removed:]
2543    return lines
2544
2545
2546def _dedent(text, tabsize=8, skip_first_line=False):
2547    """_dedent(text, tabsize=8, skip_first_line=False) -> dedented text
2548
2549        "text" is the text to dedent.
2550        "tabsize" is the tab width to use for indent width calculations.
2551        "skip_first_line" is a boolean indicating if the first line should
2552            be skipped for calculating the indent width and for dedenting.
2553            This is sometimes useful for docstrings and similar.
2554
2555    textwrap.dedent(s), but don't expand tabs to spaces
2556    """
2557    lines = text.splitlines(1)
2558    _dedentlines(lines, tabsize=tabsize, skip_first_line=skip_first_line)
2559    return ''.join(lines)
2560
2561
2562class _memoized(object):
2563    """Decorator that caches a function's return value each time it is called.
2564    If called later with the same arguments, the cached value is returned, and
2565    not re-evaluated.
2566
2567    http://wiki.python.org/moin/PythonDecoratorLibrary
2568    """
2569    def __init__(self, func):
2570        self.func = func
2571        self.cache = {}
2572
2573    def __call__(self, *args):
2574        try:
2575            return self.cache[args]
2576        except KeyError:
2577            self.cache[args] = value = self.func(*args)
2578            return value
2579        except TypeError:
2580            # uncachable -- for instance, passing a list as an argument.
2581            # Better to not cache than to blow up entirely.
2582            return self.func(*args)
2583
2584    def __repr__(self):
2585        """Return the function's docstring."""
2586        return self.func.__doc__
2587
2588
2589def _xml_oneliner_re_from_tab_width(tab_width):
2590    """Standalone XML processing instruction regex."""
2591    return re.compile(r"""
2592        (?:
2593            (?<=\n\n)       # Starting after a blank line
2594            |               # or
2595            \A\n?           # the beginning of the doc
2596        )
2597        (                           # save in $1
2598            [ ]{0,%d}
2599            (?:
2600                <\?\w+\b\s+.*?\?>   # XML processing instruction
2601                |
2602                <\w+:\w+\b\s+.*?/>  # namespaced single tag
2603            )
2604            [ \t]*
2605            (?=\n{2,}|\Z)       # followed by a blank line or end of document
2606        )
2607        """ % (tab_width - 1), re.X)
2608_xml_oneliner_re_from_tab_width = _memoized(_xml_oneliner_re_from_tab_width)
2609
2610
2611def _hr_tag_re_from_tab_width(tab_width):
2612    return re.compile(r"""
2613        (?:
2614            (?<=\n\n)       # Starting after a blank line
2615            |               # or
2616            \A\n?           # the beginning of the doc
2617        )
2618        (                       # save in \1
2619            [ ]{0,%d}
2620            <(hr)               # start tag = \2
2621            \b                  # word break
2622            ([^<>])*?           #
2623            /?>                 # the matching end tag
2624            [ \t]*
2625            (?=\n{2,}|\Z)       # followed by a blank line or end of document
2626        )
2627        """ % (tab_width - 1), re.X)
2628_hr_tag_re_from_tab_width = _memoized(_hr_tag_re_from_tab_width)
2629
2630
2631def _xml_escape_attr(attr, skip_single_quote=True):
2632    """Escape the given string for use in an HTML/XML tag attribute.
2633
2634    By default this doesn't bother with escaping `'` to `&#39;`, presuming that
2635    the tag attribute is surrounded by double quotes.
2636    """
2637    escaped = _AMPERSAND_RE.sub('&amp;', attr)
2638
2639    escaped = (attr
2640        .replace('"', '&quot;')
2641        .replace('<', '&lt;')
2642        .replace('>', '&gt;'))
2643    if not skip_single_quote:
2644        escaped = escaped.replace("'", "&#39;")
2645    return escaped
2646
2647
2648def _xml_encode_email_char_at_random(ch):
2649    r = random()
2650    # Roughly 10% raw, 45% hex, 45% dec.
2651    # '@' *must* be encoded. I [John Gruber] insist.
2652    # Issue 26: '_' must be encoded.
2653    if r > 0.9 and ch not in "@_":
2654        return ch
2655    elif r < 0.45:
2656        # The [1:] is to drop leading '0': 0x63 -> x63
2657        return '&#%s;' % hex(ord(ch))[1:]
2658    else:
2659        return '&#%s;' % ord(ch)
2660
2661
2662def _html_escape_url(attr, safe_mode=False):
2663    """Replace special characters that are potentially malicious in url string."""
2664    escaped = (attr
2665        .replace('"', '&quot;')
2666        .replace('<', '&lt;')
2667        .replace('>', '&gt;'))
2668    if safe_mode:
2669        escaped = escaped.replace('+', ' ')
2670        escaped = escaped.replace("'", "&#39;")
2671    return escaped
2672
2673
2674# ---- mainline
2675
2676class _NoReflowFormatter(optparse.IndentedHelpFormatter):
2677    """An optparse formatter that does NOT reflow the description."""
2678    def format_description(self, description):
2679        return description or ""
2680
2681
2682def _test():
2683    import doctest
2684    doctest.testmod()
2685
2686
2687def main(argv=None):
2688    if argv is None:
2689        argv = sys.argv
2690    if not logging.root.handlers:
2691        logging.basicConfig()
2692
2693    usage = "usage: %prog [PATHS...]"
2694    version = "%prog "+__version__
2695    parser = optparse.OptionParser(prog="markdown2", usage=usage,
2696        version=version, description=cmdln_desc,
2697        formatter=_NoReflowFormatter())
2698    parser.add_option("-v", "--verbose", dest="log_level",
2699                      action="store_const", const=logging.DEBUG,
2700                      help="more verbose output")
2701    parser.add_option("--encoding",
2702                      help="specify encoding of text content")
2703    parser.add_option("--html4tags", action="store_true", default=False,
2704                      help="use HTML 4 style for empty element tags")
2705    parser.add_option("-s", "--safe", metavar="MODE", dest="safe_mode",
2706                      help="sanitize literal HTML: 'escape' escapes "
2707                           "HTML meta chars, 'replace' replaces with an "
2708                           "[HTML_REMOVED] note")
2709    parser.add_option("-x", "--extras", action="append",
2710                      help="Turn on specific extra features (not part of "
2711                           "the core Markdown spec). See above.")
2712    parser.add_option("--use-file-vars",
2713                      help="Look for and use Emacs-style 'markdown-extras' "
2714                           "file var to turn on extras. See "
2715                           "<https://github.com/trentm/python-markdown2/wiki/Extras>")
2716    parser.add_option("--link-patterns-file",
2717                      help="path to a link pattern file")
2718    parser.add_option("--self-test", action="store_true",
2719                      help="run internal self-tests (some doctests)")
2720    parser.add_option("--compare", action="store_true",
2721                      help="run against Markdown.pl as well (for testing)")
2722    parser.set_defaults(log_level=logging.INFO, compare=False,
2723                        encoding="utf-8", safe_mode=None, use_file_vars=False)
2724    opts, paths = parser.parse_args()
2725    log.setLevel(opts.log_level)
2726
2727    if opts.self_test:
2728        return _test()
2729
2730    if opts.extras:
2731        extras = {}
2732        for s in opts.extras:
2733            splitter = re.compile("[,;: ]+")
2734            for e in splitter.split(s):
2735                if '=' in e:
2736                    ename, earg = e.split('=', 1)
2737                    try:
2738                        earg = int(earg)
2739                    except ValueError:
2740                        pass
2741                else:
2742                    ename, earg = e, None
2743                extras[ename] = earg
2744    else:
2745        extras = None
2746
2747    if opts.link_patterns_file:
2748        link_patterns = []
2749        f = open(opts.link_patterns_file)
2750        try:
2751            for i, line in enumerate(f.readlines()):
2752                if not line.strip(): continue
2753                if line.lstrip().startswith("#"): continue
2754                try:
2755                    pat, href = line.rstrip().rsplit(None, 1)
2756                except ValueError:
2757                    raise MarkdownError("%s:%d: invalid link pattern line: %r"
2758                                        % (opts.link_patterns_file, i+1, line))
2759                link_patterns.append(
2760                    (_regex_from_encoded_pattern(pat), href))
2761        finally:
2762            f.close()
2763    else:
2764        link_patterns = None
2765
2766    from os.path import join, dirname, abspath, exists
2767    markdown_pl = join(dirname(dirname(abspath(__file__))), "test",
2768                       "Markdown.pl")
2769    if not paths:
2770        paths = ['-']
2771    for path in paths:
2772        if path == '-':
2773            text = sys.stdin.read()
2774        else:
2775            fp = codecs.open(path, 'r', opts.encoding)
2776            text = fp.read()
2777            fp.close()
2778        if opts.compare:
2779            from subprocess import Popen, PIPE
2780            print("==== Markdown.pl ====")
2781            p = Popen('perl %s' % markdown_pl, shell=True, stdin=PIPE, stdout=PIPE, close_fds=True)
2782            p.stdin.write(text.encode('utf-8'))
2783            p.stdin.close()
2784            perl_html = p.stdout.read().decode('utf-8')
2785            if py3:
2786                sys.stdout.write(perl_html)
2787            else:
2788                sys.stdout.write(perl_html.encode(
2789                    sys.stdout.encoding or "utf-8", 'xmlcharrefreplace'))
2790            print("==== markdown2.py ====")
2791        html = markdown(text,
2792            html4tags=opts.html4tags,
2793            safe_mode=opts.safe_mode,
2794            extras=extras, link_patterns=link_patterns,
2795            use_file_vars=opts.use_file_vars,
2796            cli=True)
2797        if py3:
2798            sys.stdout.write(html)
2799        else:
2800            sys.stdout.write(html.encode(
2801                sys.stdout.encoding or "utf-8", 'xmlcharrefreplace'))
2802        if extras and "toc" in extras:
2803            log.debug("toc_html: " +
2804                str(html.toc_html.encode(sys.stdout.encoding or "utf-8", 'xmlcharrefreplace')))
2805        if opts.compare:
2806            test_dir = join(dirname(dirname(abspath(__file__))), "test")
2807            if exists(join(test_dir, "test_markdown2.py")):
2808                sys.path.insert(0, test_dir)
2809                from test_markdown2 import norm_html_from_html
2810                norm_html = norm_html_from_html(html)
2811                norm_perl_html = norm_html_from_html(perl_html)
2812            else:
2813                norm_html = html
2814                norm_perl_html = perl_html
2815            print("==== match? %r ====" % (norm_perl_html == norm_html))
2816
2817
2818if __name__ == "__main__":
2819    sys.exit(main(sys.argv))
class MarkdownError(builtins.Exception):
159class MarkdownError(Exception):
160    pass

Common base class for all non-exit exceptions.

Inherited Members
builtins.Exception
Exception
builtins.BaseException
with_traceback
args
def markdown_path( path, encoding='utf-8', html4tags=False, tab_width=4, safe_mode=None, extras=None, link_patterns=None, footnote_title=None, footnote_return_symbol=None, use_file_vars=False)
165def markdown_path(path, encoding="utf-8",
166                  html4tags=False, tab_width=DEFAULT_TAB_WIDTH,
167                  safe_mode=None, extras=None, link_patterns=None,
168                  footnote_title=None, footnote_return_symbol=None,
169                  use_file_vars=False):
170    fp = codecs.open(path, 'r', encoding)
171    text = fp.read()
172    fp.close()
173    return Markdown(html4tags=html4tags, tab_width=tab_width,
174                    safe_mode=safe_mode, extras=extras,
175                    link_patterns=link_patterns,
176                    footnote_title=footnote_title,
177                    footnote_return_symbol=footnote_return_symbol,
178                    use_file_vars=use_file_vars).convert(text)
def markdown( text, html4tags=False, tab_width=4, safe_mode=None, extras=None, link_patterns=None, footnote_title=None, footnote_return_symbol=None, use_file_vars=False, cli=False)
181def markdown(text, html4tags=False, tab_width=DEFAULT_TAB_WIDTH,
182             safe_mode=None, extras=None, link_patterns=None,
183             footnote_title=None, footnote_return_symbol=None,
184             use_file_vars=False, cli=False):
185    return Markdown(html4tags=html4tags, tab_width=tab_width,
186                    safe_mode=safe_mode, extras=extras,
187                    link_patterns=link_patterns,
188                    footnote_title=footnote_title,
189                    footnote_return_symbol=footnote_return_symbol,
190                    use_file_vars=use_file_vars, cli=cli).convert(text)
class Markdown:
 193class Markdown(object):
 194    # The dict of "extras" to enable in processing -- a mapping of
 195    # extra name to argument for the extra. Most extras do not have an
 196    # argument, in which case the value is None.
 197    #
 198    # This can be set via (a) subclassing and (b) the constructor
 199    # "extras" argument.
 200    extras = None
 201
 202    urls = None
 203    titles = None
 204    html_blocks = None
 205    html_spans = None
 206    html_removed_text = "{(#HTML#)}"  # placeholder removed text that does not trigger bold
 207    html_removed_text_compat = "[HTML_REMOVED]"  # for compat with markdown.py
 208
 209    _toc = None
 210
 211    # Used to track when we're inside an ordered or unordered list
 212    # (see _ProcessListItems() for details):
 213    list_level = 0
 214
 215    _ws_only_line_re = re.compile(r"^[ \t]+$", re.M)
 216
 217    def __init__(self, html4tags=False, tab_width=4, safe_mode=None,
 218                 extras=None, link_patterns=None,
 219                 footnote_title=None, footnote_return_symbol=None,
 220                 use_file_vars=False, cli=False):
 221        if html4tags:
 222            self.empty_element_suffix = ">"
 223        else:
 224            self.empty_element_suffix = " />"
 225        self.tab_width = tab_width
 226        self.tab = tab_width * " "
 227
 228        # For compatibility with earlier markdown2.py and with
 229        # markdown.py's safe_mode being a boolean,
 230        #   safe_mode == True -> "replace"
 231        if safe_mode is True:
 232            self.safe_mode = "replace"
 233        else:
 234            self.safe_mode = safe_mode
 235
 236        # Massaging and building the "extras" info.
 237        if self.extras is None:
 238            self.extras = {}
 239        elif not isinstance(self.extras, dict):
 240            self.extras = dict([(e, None) for e in self.extras])
 241        if extras:
 242            if not isinstance(extras, dict):
 243                extras = dict([(e, None) for e in extras])
 244            self.extras.update(extras)
 245        assert isinstance(self.extras, dict)
 246
 247        if "toc" in self.extras:
 248            if "header-ids" not in self.extras:
 249                self.extras["header-ids"] = None   # "toc" implies "header-ids"
 250
 251            if self.extras["toc"] is None:
 252                self._toc_depth = 6
 253            else:
 254                self._toc_depth = self.extras["toc"].get("depth", 6)
 255        self._instance_extras = self.extras.copy()
 256
 257        self.link_patterns = link_patterns
 258        self.footnote_title = footnote_title
 259        self.footnote_return_symbol = footnote_return_symbol
 260        self.use_file_vars = use_file_vars
 261        self._outdent_re = re.compile(r'^(\t|[ ]{1,%d})' % tab_width, re.M)
 262        self.cli = cli
 263
 264        self._escape_table = g_escape_table.copy()
 265        self._code_table = {}
 266        if "smarty-pants" in self.extras:
 267            self._escape_table['"'] = _hash_text('"')
 268            self._escape_table["'"] = _hash_text("'")
 269
 270    def reset(self):
 271        self.urls = {}
 272        self.titles = {}
 273        self.html_blocks = {}
 274        self.html_spans = {}
 275        self.list_level = 0
 276        self.extras = self._instance_extras.copy()
 277        if "footnotes" in self.extras:
 278            self.footnotes = {}
 279            self.footnote_ids = []
 280        if "header-ids" in self.extras:
 281            self._count_from_header_id = defaultdict(int)
 282        if "metadata" in self.extras:
 283            self.metadata = {}
 284        self._toc = None
 285
 286    # Per <https://developer.mozilla.org/en-US/docs/HTML/Element/a> "rel"
 287    # should only be used in <a> tags with an "href" attribute.
 288
 289    # Opens the linked document in a new window or tab
 290    # should only used in <a> tags with an "href" attribute.
 291    # same with _a_nofollow
 292    _a_nofollow_or_blank_links = re.compile(r"""
 293        <(a)
 294        (
 295            [^>]*
 296            href=   # href is required
 297            ['"]?   # HTML5 attribute values do not have to be quoted
 298            [^#'"]  # We don't want to match href values that start with # (like footnotes)
 299        )
 300        """,
 301        re.IGNORECASE | re.VERBOSE
 302    )
 303
 304    def convert(self, text):
 305        """Convert the given text."""
 306        # Main function. The order in which other subs are called here is
 307        # essential. Link and image substitutions need to happen before
 308        # _EscapeSpecialChars(), so that any *'s or _'s in the <a>
 309        # and <img> tags get encoded.
 310
 311        # Clear the global hashes. If we don't clear these, you get conflicts
 312        # from other articles when generating a page which contains more than
 313        # one article (e.g. an index page that shows the N most recent
 314        # articles):
 315        self.reset()
 316
 317        if not isinstance(text, unicode):
 318            # TODO: perhaps shouldn't presume UTF-8 for string input?
 319            text = unicode(text, 'utf-8')
 320
 321        if self.use_file_vars:
 322            # Look for emacs-style file variable hints.
 323            emacs_vars = self._get_emacs_vars(text)
 324            if "markdown-extras" in emacs_vars:
 325                splitter = re.compile("[ ,]+")
 326                for e in splitter.split(emacs_vars["markdown-extras"]):
 327                    if '=' in e:
 328                        ename, earg = e.split('=', 1)
 329                        try:
 330                            earg = int(earg)
 331                        except ValueError:
 332                            pass
 333                    else:
 334                        ename, earg = e, None
 335                    self.extras[ename] = earg
 336
 337        # Standardize line endings:
 338        text = text.replace("\r\n", "\n")
 339        text = text.replace("\r", "\n")
 340
 341        # Make sure $text ends with a couple of newlines:
 342        text += "\n\n"
 343
 344        # Convert all tabs to spaces.
 345        text = self._detab(text)
 346
 347        # Strip any lines consisting only of spaces and tabs.
 348        # This makes subsequent regexen easier to write, because we can
 349        # match consecutive blank lines with /\n+/ instead of something
 350        # contorted like /[ \t]*\n+/ .
 351        text = self._ws_only_line_re.sub("", text)
 352
 353        # strip metadata from head and extract
 354        if "metadata" in self.extras:
 355            text = self._extract_metadata(text)
 356
 357        text = self.preprocess(text)
 358
 359        if "fenced-code-blocks" in self.extras and not self.safe_mode:
 360            text = self._do_fenced_code_blocks(text)
 361
 362        if self.safe_mode:
 363            text = self._hash_html_spans(text)
 364
 365        # Turn block-level HTML blocks into hash entries
 366        text = self._hash_html_blocks(text, raw=True)
 367
 368        if "fenced-code-blocks" in self.extras and self.safe_mode:
 369            text = self._do_fenced_code_blocks(text)
 370
 371        # Because numbering references aren't links (yet?) then we can do everything associated with counters
 372        # before we get started
 373        if "numbering" in self.extras:
 374            text = self._do_numbering(text)
 375
 376        # Strip link definitions, store in hashes.
 377        if "footnotes" in self.extras:
 378            # Must do footnotes first because an unlucky footnote defn
 379            # looks like a link defn:
 380            #   [^4]: this "looks like a link defn"
 381            text = self._strip_footnote_definitions(text)
 382        text = self._strip_link_definitions(text)
 383
 384        text = self._run_block_gamut(text)
 385
 386        if "footnotes" in self.extras:
 387            text = self._add_footnotes(text)
 388
 389        text = self.postprocess(text)
 390
 391        text = self._unescape_special_chars(text)
 392
 393        if self.safe_mode:
 394            text = self._unhash_html_spans(text)
 395            # return the removed text warning to its markdown.py compatible form
 396            text = text.replace(self.html_removed_text, self.html_removed_text_compat)
 397
 398        do_target_blank_links = "target-blank-links" in self.extras
 399        do_nofollow_links = "nofollow" in self.extras
 400
 401        if do_target_blank_links and do_nofollow_links:
 402            text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="nofollow noopener" target="_blank"\2', text)
 403        elif do_target_blank_links:
 404            text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="noopener" target="_blank"\2', text)
 405        elif do_nofollow_links:
 406            text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="nofollow"\2', text)
 407
 408        if "toc" in self.extras and self._toc:
 409            self._toc_html = calculate_toc_html(self._toc)
 410
 411            # Prepend toc html to output
 412            if self.cli:
 413                text = '{}\n{}'.format(self._toc_html, text)
 414
 415        text += "\n"
 416
 417        # Attach attrs to output
 418        rv = UnicodeWithAttrs(text)
 419
 420        if "toc" in self.extras and self._toc:
 421            rv.toc_html = self._toc_html
 422
 423        if "metadata" in self.extras:
 424            rv.metadata = self.metadata
 425        return rv
 426
 427    def postprocess(self, text):
 428        """A hook for subclasses to do some postprocessing of the html, if
 429        desired. This is called before unescaping of special chars and
 430        unhashing of raw HTML spans.
 431        """
 432        return text
 433
 434    def preprocess(self, text):
 435        """A hook for subclasses to do some preprocessing of the Markdown, if
 436        desired. This is called after basic formatting of the text, but prior
 437        to any extras, safe mode, etc. processing.
 438        """
 439        return text
 440
 441    # Is metadata if the content starts with optional '---'-fenced `key: value`
 442    # pairs. E.g. (indented for presentation):
 443    #   ---
 444    #   foo: bar
 445    #   another-var: blah blah
 446    #   ---
 447    #   # header
 448    # or:
 449    #   foo: bar
 450    #   another-var: blah blah
 451    #
 452    #   # header
 453    _meta_data_pattern = re.compile(r'^(?:---[\ \t]*\n)?((?:[\S\w]+\s*:(?:\n+[ \t]+.*)+)|(?:.*:\s+>\n\s+[\S\s]+?)(?=\n\w+\s*:\s*\w+\n|\Z)|(?:\s*[\S\w]+\s*:(?! >)[ \t]*.*\n?))(?:---[\ \t]*\n)?', re.MULTILINE)
 454    _key_val_pat = re.compile(r"[\S\w]+\s*:(?! >)[ \t]*.*\n?", re.MULTILINE)
 455    # this allows key: >
 456    #                   value
 457    #                   conutiues over multiple lines
 458    _key_val_block_pat = re.compile(
 459        r"(.*:\s+>\n\s+[\S\s]+?)(?=\n\w+\s*:\s*\w+\n|\Z)", re.MULTILINE
 460    )
 461    _key_val_list_pat = re.compile(
 462        r"^-(?:[ \t]*([^\n]*)(?:[ \t]*[:-][ \t]*(\S+))?)(?:\n((?:[ \t]+[^\n]+\n?)+))?",
 463        re.MULTILINE,
 464    )
 465    _key_val_dict_pat = re.compile(
 466        r"^([^:\n]+)[ \t]*:[ \t]*([^\n]*)(?:((?:\n[ \t]+[^\n]+)+))?", re.MULTILINE
 467    )  # grp0: key, grp1: value, grp2: multiline value
 468    _meta_data_fence_pattern = re.compile(r'^---[\ \t]*\n', re.MULTILINE)
 469    _meta_data_newline = re.compile("^\n", re.MULTILINE)
 470
 471    def _extract_metadata(self, text):
 472        if text.startswith("---"):
 473            fence_splits = re.split(self._meta_data_fence_pattern, text, maxsplit=2)
 474            metadata_content = fence_splits[1]
 475            match = re.findall(self._meta_data_pattern, metadata_content)
 476            if not match:
 477                return text
 478            tail = fence_splits[2]
 479        else:
 480            metadata_split = re.split(self._meta_data_newline, text, maxsplit=1)
 481            metadata_content = metadata_split[0]
 482            match = re.findall(self._meta_data_pattern, metadata_content)
 483            if not match:
 484                return text
 485            tail = metadata_split[1]
 486
 487        def parse_structured_value(value):
 488            vs = value.lstrip()
 489            vs = value.replace(v[: len(value) - len(vs)], "\n")[1:]
 490
 491            # List
 492            if vs.startswith("-"):
 493                r = []
 494                for match in re.findall(self._key_val_list_pat, vs):
 495                    if match[0] and not match[1] and not match[2]:
 496                        r.append(match[0].strip())
 497                    elif match[0] == ">" and not match[1] and match[2]:
 498                        r.append(match[2].strip())
 499                    elif match[0] and match[1]:
 500                        r.append({match[0].strip(): match[1].strip()})
 501                    elif not match[0] and not match[1] and match[2]:
 502                        r.append(parse_structured_value(match[2]))
 503                    else:
 504                        # Broken case
 505                        pass
 506
 507                return r
 508
 509            # Dict
 510            else:
 511                return {
 512                    match[0].strip(): (
 513                        match[1].strip()
 514                        if match[1]
 515                        else parse_structured_value(match[2])
 516                    )
 517                    for match in re.findall(self._key_val_dict_pat, vs)
 518                }
 519
 520        for item in match:
 521
 522            k, v = item.split(":", 1)
 523
 524            # Multiline value
 525            if v[:3] == " >\n":
 526                self.metadata[k.strip()] = _dedent(v[3:]).strip()
 527
 528            # Empty value
 529            elif v == "\n":
 530                self.metadata[k.strip()] = ""
 531
 532            # Structured value
 533            elif v[0] == "\n":
 534                self.metadata[k.strip()] = parse_structured_value(v)
 535
 536            # Simple value
 537            else:
 538                self.metadata[k.strip()] = v.strip()
 539
 540        return tail
 541
 542    _emacs_oneliner_vars_pat = re.compile(r"-\*-\s*(?:(\S[^\r\n]*?)([\r\n]\s*)?)?-\*-", re.UNICODE)
 543    # This regular expression is intended to match blocks like this:
 544    #    PREFIX Local Variables: SUFFIX
 545    #    PREFIX mode: Tcl SUFFIX
 546    #    PREFIX End: SUFFIX
 547    # Some notes:
 548    # - "[ \t]" is used instead of "\s" to specifically exclude newlines
 549    # - "(\r\n|\n|\r)" is used instead of "$" because the sre engine does
 550    #   not like anything other than Unix-style line terminators.
 551    _emacs_local_vars_pat = re.compile(r"""^
 552        (?P<prefix>(?:[^\r\n|\n|\r])*?)
 553        [\ \t]*Local\ Variables:[\ \t]*
 554        (?P<suffix>.*?)(?:\r\n|\n|\r)
 555        (?P<content>.*?\1End:)
 556        """, re.IGNORECASE | re.MULTILINE | re.DOTALL | re.VERBOSE)
 557
 558    def _get_emacs_vars(self, text):
 559        """Return a dictionary of emacs-style local variables.
 560
 561        Parsing is done loosely according to this spec (and according to
 562        some in-practice deviations from this):
 563        http://www.gnu.org/software/emacs/manual/html_node/emacs/Specifying-File-Variables.html#Specifying-File-Variables
 564        """
 565        emacs_vars = {}
 566        SIZE = pow(2, 13)  # 8kB
 567
 568        # Search near the start for a '-*-'-style one-liner of variables.
 569        head = text[:SIZE]
 570        if "-*-" in head:
 571            match = self._emacs_oneliner_vars_pat.search(head)
 572            if match:
 573                emacs_vars_str = match.group(1)
 574                assert '\n' not in emacs_vars_str
 575                emacs_var_strs = [s.strip() for s in emacs_vars_str.split(';')
 576                                  if s.strip()]
 577                if len(emacs_var_strs) == 1 and ':' not in emacs_var_strs[0]:
 578                    # While not in the spec, this form is allowed by emacs:
 579                    #   -*- Tcl -*-
 580                    # where the implied "variable" is "mode". This form
 581                    # is only allowed if there are no other variables.
 582                    emacs_vars["mode"] = emacs_var_strs[0].strip()
 583                else:
 584                    for emacs_var_str in emacs_var_strs:
 585                        try:
 586                            variable, value = emacs_var_str.strip().split(':', 1)
 587                        except ValueError:
 588                            log.debug("emacs variables error: malformed -*- "
 589                                      "line: %r", emacs_var_str)
 590                            continue
 591                        # Lowercase the variable name because Emacs allows "Mode"
 592                        # or "mode" or "MoDe", etc.
 593                        emacs_vars[variable.lower()] = value.strip()
 594
 595        tail = text[-SIZE:]
 596        if "Local Variables" in tail:
 597            match = self._emacs_local_vars_pat.search(tail)
 598            if match:
 599                prefix = match.group("prefix")
 600                suffix = match.group("suffix")
 601                lines = match.group("content").splitlines(0)
 602                # print "prefix=%r, suffix=%r, content=%r, lines: %s"\
 603                #      % (prefix, suffix, match.group("content"), lines)
 604
 605                # Validate the Local Variables block: proper prefix and suffix
 606                # usage.
 607                for i, line in enumerate(lines):
 608                    if not line.startswith(prefix):
 609                        log.debug("emacs variables error: line '%s' "
 610                                  "does not use proper prefix '%s'"
 611                                  % (line, prefix))
 612                        return {}
 613                    # Don't validate suffix on last line. Emacs doesn't care,
 614                    # neither should we.
 615                    if i != len(lines)-1 and not line.endswith(suffix):
 616                        log.debug("emacs variables error: line '%s' "
 617                                  "does not use proper suffix '%s'"
 618                                  % (line, suffix))
 619                        return {}
 620
 621                # Parse out one emacs var per line.
 622                continued_for = None
 623                for line in lines[:-1]:  # no var on the last line ("PREFIX End:")
 624                    if prefix: line = line[len(prefix):]  # strip prefix
 625                    if suffix: line = line[:-len(suffix)]  # strip suffix
 626                    line = line.strip()
 627                    if continued_for:
 628                        variable = continued_for
 629                        if line.endswith('\\'):
 630                            line = line[:-1].rstrip()
 631                        else:
 632                            continued_for = None
 633                        emacs_vars[variable] += ' ' + line
 634                    else:
 635                        try:
 636                            variable, value = line.split(':', 1)
 637                        except ValueError:
 638                            log.debug("local variables error: missing colon "
 639                                      "in local variables entry: '%s'" % line)
 640                            continue
 641                        # Do NOT lowercase the variable name, because Emacs only
 642                        # allows "mode" (and not "Mode", "MoDe", etc.) in this block.
 643                        value = value.strip()
 644                        if value.endswith('\\'):
 645                            value = value[:-1].rstrip()
 646                            continued_for = variable
 647                        else:
 648                            continued_for = None
 649                        emacs_vars[variable] = value
 650
 651        # Unquote values.
 652        for var, val in list(emacs_vars.items()):
 653            if len(val) > 1 and (val.startswith('"') and val.endswith('"')
 654               or val.startswith('"') and val.endswith('"')):
 655                emacs_vars[var] = val[1:-1]
 656
 657        return emacs_vars
 658
 659    def _detab_line(self, line):
 660        r"""Recusively convert tabs to spaces in a single line.
 661
 662        Called from _detab()."""
 663        if '\t' not in line:
 664            return line
 665        chunk1, chunk2 = line.split('\t', 1)
 666        chunk1 += (' ' * (self.tab_width - len(chunk1) % self.tab_width))
 667        output = chunk1 + chunk2
 668        return self._detab_line(output)
 669
 670    def _detab(self, text):
 671        r"""Iterate text line by line and convert tabs to spaces.
 672
 673            >>> m = Markdown()
 674            >>> m._detab("\tfoo")
 675            '    foo'
 676            >>> m._detab("  \tfoo")
 677            '    foo'
 678            >>> m._detab("\t  foo")
 679            '      foo'
 680            >>> m._detab("  foo")
 681            '  foo'
 682            >>> m._detab("  foo\n\tbar\tblam")
 683            '  foo\n    bar blam'
 684        """
 685        if '\t' not in text:
 686            return text
 687        output = []
 688        for line in text.splitlines():
 689            output.append(self._detab_line(line))
 690        return '\n'.join(output)
 691
 692    # I broke out the html5 tags here and add them to _block_tags_a and
 693    # _block_tags_b.  This way html5 tags are easy to keep track of.
 694    _html5tags = '|article|aside|header|hgroup|footer|nav|section|figure|figcaption'
 695
 696    _block_tags_a = 'p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math|ins|del'
 697    _block_tags_a += _html5tags
 698
 699    _strict_tag_block_re = re.compile(r"""
 700        (                       # save in \1
 701            ^                   # start of line  (with re.M)
 702            <(%s)               # start tag = \2
 703            \b                  # word break
 704            (.*\n)*?            # any number of lines, minimally matching
 705            </\2>               # the matching end tag
 706            [ \t]*              # trailing spaces/tabs
 707            (?=\n+|\Z)          # followed by a newline or end of document
 708        )
 709        """ % _block_tags_a,
 710        re.X | re.M)
 711
 712    _block_tags_b = 'p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math'
 713    _block_tags_b += _html5tags
 714
 715    _liberal_tag_block_re = re.compile(r"""
 716        (                       # save in \1
 717            ^                   # start of line  (with re.M)
 718            <(%s)               # start tag = \2
 719            \b                  # word break
 720            (.*\n)*?            # any number of lines, minimally matching
 721            .*</\2>             # the matching end tag
 722            [ \t]*              # trailing spaces/tabs
 723            (?=\n+|\Z)          # followed by a newline or end of document
 724        )
 725        """ % _block_tags_b,
 726        re.X | re.M)
 727
 728    _html_markdown_attr_re = re.compile(
 729        r'''\s+markdown=("1"|'1')''')
 730    def _hash_html_block_sub(self, match, raw=False):
 731        html = match.group(1)
 732        if raw and self.safe_mode:
 733            html = self._sanitize_html(html)
 734        elif 'markdown-in-html' in self.extras and 'markdown=' in html:
 735            first_line = html.split('\n', 1)[0]
 736            m = self._html_markdown_attr_re.search(first_line)
 737            if m:
 738                lines = html.split('\n')
 739                middle = '\n'.join(lines[1:-1])
 740                last_line = lines[-1]
 741                first_line = first_line[:m.start()] + first_line[m.end():]
 742                f_key = _hash_text(first_line)
 743                self.html_blocks[f_key] = first_line
 744                l_key = _hash_text(last_line)
 745                self.html_blocks[l_key] = last_line
 746                return ''.join(["\n\n", f_key,
 747                    "\n\n", middle, "\n\n",
 748                    l_key, "\n\n"])
 749        key = _hash_text(html)
 750        self.html_blocks[key] = html
 751        return "\n\n" + key + "\n\n"
 752
 753    def _hash_html_blocks(self, text, raw=False):
 754        """Hashify HTML blocks
 755
 756        We only want to do this for block-level HTML tags, such as headers,
 757        lists, and tables. That's because we still want to wrap <p>s around
 758        "paragraphs" that are wrapped in non-block-level tags, such as anchors,
 759        phrase emphasis, and spans. The list of tags we're looking for is
 760        hard-coded.
 761
 762        @param raw {boolean} indicates if these are raw HTML blocks in
 763            the original source. It makes a difference in "safe" mode.
 764        """
 765        if '<' not in text:
 766            return text
 767
 768        # Pass `raw` value into our calls to self._hash_html_block_sub.
 769        hash_html_block_sub = _curry(self._hash_html_block_sub, raw=raw)
 770
 771        # First, look for nested blocks, e.g.:
 772        #   <div>
 773        #       <div>
 774        #       tags for inner block must be indented.
 775        #       </div>
 776        #   </div>
 777        #
 778        # The outermost tags must start at the left margin for this to match, and
 779        # the inner nested divs must be indented.
 780        # We need to do this before the next, more liberal match, because the next
 781        # match will start at the first `<div>` and stop at the first `</div>`.
 782        text = self._strict_tag_block_re.sub(hash_html_block_sub, text)
 783
 784        # Now match more liberally, simply from `\n<tag>` to `</tag>\n`
 785        text = self._liberal_tag_block_re.sub(hash_html_block_sub, text)
 786
 787        # Special case just for <hr />. It was easier to make a special
 788        # case than to make the other regex more complicated.
 789        if "<hr" in text:
 790            _hr_tag_re = _hr_tag_re_from_tab_width(self.tab_width)
 791            text = _hr_tag_re.sub(hash_html_block_sub, text)
 792
 793        # Special case for standalone HTML comments:
 794        if "<!--" in text:
 795            start = 0
 796            while True:
 797                # Delimiters for next comment block.
 798                try:
 799                    start_idx = text.index("<!--", start)
 800                except ValueError:
 801                    break
 802                try:
 803                    end_idx = text.index("-->", start_idx) + 3
 804                except ValueError:
 805                    break
 806
 807                # Start position for next comment block search.
 808                start = end_idx
 809
 810                # Validate whitespace before comment.
 811                if start_idx:
 812                    # - Up to `tab_width - 1` spaces before start_idx.
 813                    for i in range(self.tab_width - 1):
 814                        if text[start_idx - 1] != ' ':
 815                            break
 816                        start_idx -= 1
 817                        if start_idx == 0:
 818                            break
 819                    # - Must be preceded by 2 newlines or hit the start of
 820                    #   the document.
 821                    if start_idx == 0:
 822                        pass
 823                    elif start_idx == 1 and text[0] == '\n':
 824                        start_idx = 0  # to match minute detail of Markdown.pl regex
 825                    elif text[start_idx-2:start_idx] == '\n\n':
 826                        pass
 827                    else:
 828                        break
 829
 830                # Validate whitespace after comment.
 831                # - Any number of spaces and tabs.
 832                while end_idx < len(text):
 833                    if text[end_idx] not in ' \t':
 834                        break
 835                    end_idx += 1
 836                # - Must be following by 2 newlines or hit end of text.
 837                if text[end_idx:end_idx+2] not in ('', '\n', '\n\n'):
 838                    continue
 839
 840                # Escape and hash (must match `_hash_html_block_sub`).
 841                html = text[start_idx:end_idx]
 842                if raw and self.safe_mode:
 843                    html = self._sanitize_html(html)
 844                key = _hash_text(html)
 845                self.html_blocks[key] = html
 846                text = text[:start_idx] + "\n\n" + key + "\n\n" + text[end_idx:]
 847
 848        if "xml" in self.extras:
 849            # Treat XML processing instructions and namespaced one-liner
 850            # tags as if they were block HTML tags. E.g., if standalone
 851            # (i.e. are their own paragraph), the following do not get
 852            # wrapped in a <p> tag:
 853            #    <?foo bar?>
 854            #
 855            #    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="chapter_1.md"/>
 856            _xml_oneliner_re = _xml_oneliner_re_from_tab_width(self.tab_width)
 857            text = _xml_oneliner_re.sub(hash_html_block_sub, text)
 858
 859        return text
 860
 861    def _strip_link_definitions(self, text):
 862        # Strips link definitions from text, stores the URLs and titles in
 863        # hash references.
 864        less_than_tab = self.tab_width - 1
 865
 866        # Link defs are in the form:
 867        #   [id]: url "optional title"
 868        _link_def_re = re.compile(r"""
 869            ^[ ]{0,%d}\[(.+)\]: # id = \1
 870              [ \t]*
 871              \n?               # maybe *one* newline
 872              [ \t]*
 873            <?(.+?)>?           # url = \2
 874              [ \t]*
 875            (?:
 876                \n?             # maybe one newline
 877                [ \t]*
 878                (?<=\s)         # lookbehind for whitespace
 879                ['"(]
 880                ([^\n]*)        # title = \3
 881                ['")]
 882                [ \t]*
 883            )?  # title is optional
 884            (?:\n+|\Z)
 885            """ % less_than_tab, re.X | re.M | re.U)
 886        return _link_def_re.sub(self._extract_link_def_sub, text)
 887
 888    def _extract_link_def_sub(self, match):
 889        id, url, title = match.groups()
 890        key = id.lower()    # Link IDs are case-insensitive
 891        self.urls[key] = self._encode_amps_and_angles(url)
 892        if title:
 893            self.titles[key] = title
 894        return ""
 895
 896    def _do_numbering(self, text):
 897        ''' We handle the special extension for generic numbering for
 898            tables, figures etc.
 899        '''
 900        # First pass to define all the references
 901        self.regex_defns = re.compile(r'''
 902            \[\#(\w+) # the counter.  Open square plus hash plus a word \1
 903            ([^@]*)   # Some optional characters, that aren't an @. \2
 904            @(\w+)       # the id.  Should this be normed? \3
 905            ([^\]]*)\]   # The rest of the text up to the terminating ] \4
 906            ''', re.VERBOSE)
 907        self.regex_subs = re.compile(r"\[@(\w+)\s*\]")  # [@ref_id]
 908        counters = {}
 909        references = {}
 910        replacements = []
 911        definition_html = '<figcaption class="{}" id="counter-ref-{}">{}{}{}</figcaption>'
 912        reference_html = '<a class="{}" href="#counter-ref-{}">{}</a>'
 913        for match in self.regex_defns.finditer(text):
 914            # We must have four match groups otherwise this isn't a numbering reference
 915            if len(match.groups()) != 4:
 916                continue
 917            counter = match.group(1)
 918            text_before = match.group(2).strip()
 919            ref_id = match.group(3)
 920            text_after = match.group(4)
 921            number = counters.get(counter, 1)
 922            references[ref_id] = (number, counter)
 923            replacements.append((match.start(0),
 924                                 definition_html.format(counter,
 925                                                        ref_id,
 926                                                        text_before,
 927                                                        number,
 928                                                        text_after),
 929                                 match.end(0)))
 930            counters[counter] = number + 1
 931        for repl in reversed(replacements):
 932            text = text[:repl[0]] + repl[1] + text[repl[2]:]
 933
 934        # Second pass to replace the references with the right
 935        # value of the counter
 936        # Fwiw, it's vaguely annoying to have to turn the iterator into
 937        # a list and then reverse it but I can't think of a better thing to do.
 938        for match in reversed(list(self.regex_subs.finditer(text))):
 939            number, counter = references.get(match.group(1), (None, None))
 940            if number is not None:
 941                repl = reference_html.format(counter,
 942                                             match.group(1),
 943                                             number)
 944            else:
 945                repl = reference_html.format(match.group(1),
 946                                             'countererror',
 947                                             '?' + match.group(1) + '?')
 948            if "smarty-pants" in self.extras:
 949                repl = repl.replace('"', self._escape_table['"'])
 950
 951            text = text[:match.start()] + repl + text[match.end():]
 952        return text
 953
 954    def _extract_footnote_def_sub(self, match):
 955        id, text = match.groups()
 956        text = _dedent(text, skip_first_line=not text.startswith('\n')).strip()
 957        normed_id = re.sub(r'\W', '-', id)
 958        # Ensure footnote text ends with a couple newlines (for some
 959        # block gamut matches).
 960        self.footnotes[normed_id] = text + "\n\n"
 961        return ""
 962
 963    def _strip_footnote_definitions(self, text):
 964        """A footnote definition looks like this:
 965
 966            [^note-id]: Text of the note.
 967
 968                May include one or more indented paragraphs.
 969
 970        Where,
 971        - The 'note-id' can be pretty much anything, though typically it
 972          is the number of the footnote.
 973        - The first paragraph may start on the next line, like so:
 974
 975            [^note-id]:
 976                Text of the note.
 977        """
 978        less_than_tab = self.tab_width - 1
 979        footnote_def_re = re.compile(r'''
 980            ^[ ]{0,%d}\[\^(.+)\]:   # id = \1
 981            [ \t]*
 982            (                       # footnote text = \2
 983              # First line need not start with the spaces.
 984              (?:\s*.*\n+)
 985              (?:
 986                (?:[ ]{%d} | \t)  # Subsequent lines must be indented.
 987                .*\n+
 988              )*
 989            )
 990            # Lookahead for non-space at line-start, or end of doc.
 991            (?:(?=^[ ]{0,%d}\S)|\Z)
 992            ''' % (less_than_tab, self.tab_width, self.tab_width),
 993            re.X | re.M)
 994        return footnote_def_re.sub(self._extract_footnote_def_sub, text)
 995
 996    _hr_re = re.compile(r'^[ ]{0,3}([-_*])[ ]{0,2}(\1[ ]{0,2}){2,}$', re.M)
 997
 998    def _run_block_gamut(self, text):
 999        # These are all the transformations that form block-level
1000        # tags like paragraphs, headers, and list items.
1001
1002        if "fenced-code-blocks" in self.extras:
1003            text = self._do_fenced_code_blocks(text)
1004
1005        text = self._do_headers(text)
1006
1007        # Do Horizontal Rules:
1008        # On the number of spaces in horizontal rules: The spec is fuzzy: "If
1009        # you wish, you may use spaces between the hyphens or asterisks."
1010        # Markdown.pl 1.0.1's hr regexes limit the number of spaces between the
1011        # hr chars to one or two. We'll reproduce that limit here.
1012        hr = "\n<hr"+self.empty_element_suffix+"\n"
1013        text = re.sub(self._hr_re, hr, text)
1014
1015        text = self._do_lists(text)
1016
1017        if "pyshell" in self.extras:
1018            text = self._prepare_pyshell_blocks(text)
1019        if "wiki-tables" in self.extras:
1020            text = self._do_wiki_tables(text)
1021        if "tables" in self.extras:
1022            text = self._do_tables(text)
1023
1024        text = self._do_code_blocks(text)
1025
1026        text = self._do_block_quotes(text)
1027
1028        # We already ran _HashHTMLBlocks() before, in Markdown(), but that
1029        # was to escape raw HTML in the original Markdown source. This time,
1030        # we're escaping the markup we've just created, so that we don't wrap
1031        # <p> tags around block-level tags.
1032        text = self._hash_html_blocks(text)
1033
1034        text = self._form_paragraphs(text)
1035
1036        return text
1037
1038    def _pyshell_block_sub(self, match):
1039        if "fenced-code-blocks" in self.extras:
1040            dedented = _dedent(match.group(0))
1041            return self._do_fenced_code_blocks("```pycon\n" + dedented + "```\n")
1042        lines = match.group(0).splitlines(0)
1043        _dedentlines(lines)
1044        indent = ' ' * self.tab_width
1045        s = ('\n'  # separate from possible cuddled paragraph
1046             + indent + ('\n'+indent).join(lines)
1047             + '\n\n')
1048        return s
1049
1050    def _prepare_pyshell_blocks(self, text):
1051        """Ensure that Python interactive shell sessions are put in
1052        code blocks -- even if not properly indented.
1053        """
1054        if ">>>" not in text:
1055            return text
1056
1057        less_than_tab = self.tab_width - 1
1058        _pyshell_block_re = re.compile(r"""
1059            ^([ ]{0,%d})>>>[ ].*\n  # first line
1060            ^(\1[^\S\n]*\S.*\n)*    # any number of subsequent lines with at least one character
1061            ^\n                     # ends with a blank line
1062            """ % less_than_tab, re.M | re.X)
1063
1064        return _pyshell_block_re.sub(self._pyshell_block_sub, text)
1065
1066    def _table_sub(self, match):
1067        trim_space_re = '^[ \t\n]+|[ \t\n]+$'
1068        trim_bar_re = r'^\||\|$'
1069        split_bar_re = r'^\||(?<![\`\\])\|'
1070        escape_bar_re = r'\\\|'
1071
1072        head, underline, body = match.groups()
1073
1074        # Determine aligns for columns.
1075        cols = [re.sub(escape_bar_re, '|', cell.strip()) for cell in re.split(split_bar_re, re.sub(trim_bar_re, "", re.sub(trim_space_re, "", underline)))]
1076        align_from_col_idx = {}
1077        for col_idx, col in enumerate(cols):
1078            if col[0] == ':' and col[-1] == ':':
1079                align_from_col_idx[col_idx] = ' style="text-align:center;"'
1080            elif col[0] == ':':
1081                align_from_col_idx[col_idx] = ' style="text-align:left;"'
1082            elif col[-1] == ':':
1083                align_from_col_idx[col_idx] = ' style="text-align:right;"'
1084
1085        # thead
1086        hlines = ['<table%s>' % self._html_class_str_from_tag('table'), '<thead>', '<tr>']
1087        cols = [re.sub(escape_bar_re, '|', cell.strip()) for cell in re.split(split_bar_re, re.sub(trim_bar_re, "", re.sub(trim_space_re, "", head)))]
1088        for col_idx, col in enumerate(cols):
1089            hlines.append('  <th%s>%s</th>' % (
1090                align_from_col_idx.get(col_idx, ''),
1091                self._run_span_gamut(col)
1092            ))
1093        hlines.append('</tr>')
1094        hlines.append('</thead>')
1095
1096        # tbody
1097        hlines.append('<tbody>')
1098        for line in body.strip('\n').split('\n'):
1099            hlines.append('<tr>')
1100            cols = [re.sub(escape_bar_re, '|', cell.strip()) for cell in re.split(split_bar_re, re.sub(trim_bar_re, "", re.sub(trim_space_re, "", line)))]
1101            for col_idx, col in enumerate(cols):
1102                hlines.append('  <td%s>%s</td>' % (
1103                    align_from_col_idx.get(col_idx, ''),
1104                    self._run_span_gamut(col)
1105                ))
1106            hlines.append('</tr>')
1107        hlines.append('</tbody>')
1108        hlines.append('</table>')
1109
1110        return '\n'.join(hlines) + '\n'
1111
1112    def _do_tables(self, text):
1113        """Copying PHP-Markdown and GFM table syntax. Some regex borrowed from
1114        https://github.com/michelf/php-markdown/blob/lib/Michelf/Markdown.php#L2538
1115        """
1116        less_than_tab = self.tab_width - 1
1117        table_re = re.compile(r'''
1118                (?:(?<=\n\n)|\A\n?)             # leading blank line
1119
1120                ^[ ]{0,%d}                      # allowed whitespace
1121                (.*[|].*)  \n                   # $1: header row (at least one pipe)
1122
1123                ^[ ]{0,%d}                      # allowed whitespace
1124                (                               # $2: underline row
1125                    # underline row with leading bar
1126                    (?:  \|\ *:?-+:?\ *  )+  \|? \s? \n
1127                    |
1128                    # or, underline row without leading bar
1129                    (?:  \ *:?-+:?\ *\|  )+  (?:  \ *:?-+:?\ *  )? \s? \n
1130                )
1131
1132                (                               # $3: data rows
1133                    (?:
1134                        ^[ ]{0,%d}(?!\ )         # ensure line begins with 0 to less_than_tab spaces
1135                        .*\|.*  \n
1136                    )+
1137                )
1138            ''' % (less_than_tab, less_than_tab, less_than_tab), re.M | re.X)
1139        return table_re.sub(self._table_sub, text)
1140
1141    def _wiki_table_sub(self, match):
1142        ttext = match.group(0).strip()
1143        # print('wiki table: %r' % match.group(0))
1144        rows = []
1145        for line in ttext.splitlines(0):
1146            line = line.strip()[2:-2].strip()
1147            row = [c.strip() for c in re.split(r'(?<!\\)\|\|', line)]
1148            rows.append(row)
1149        # from pprint import pprint
1150        # pprint(rows)
1151        hlines = []
1152
1153        def add_hline(line, indents=0):
1154            hlines.append((self.tab * indents) + line)
1155
1156        def format_cell(text):
1157            return self._run_span_gamut(re.sub(r"^\s*~", "", cell).strip(" "))
1158
1159        add_hline('<table%s>' % self._html_class_str_from_tag('table'))
1160        # Check if first cell of first row is a header cell. If so, assume the whole row is a header row.
1161        if rows and rows[0] and re.match(r"^\s*~", rows[0][0]):
1162            add_hline('<thead>', 1)
1163            add_hline('<tr>', 2)
1164            for cell in rows[0]:
1165                add_hline("<th>{}</th>".format(format_cell(cell)), 3)
1166            add_hline('</tr>', 2)
1167            add_hline('</thead>', 1)
1168            # Only one header row allowed.
1169            rows = rows[1:]
1170        # If no more rows, don't create a tbody.
1171        if rows:
1172            add_hline('<tbody>', 1)
1173            for row in rows:
1174                add_hline('<tr>', 2)
1175                for cell in row:
1176                    add_hline('<td>{}</td>'.format(format_cell(cell)), 3)
1177                add_hline('</tr>', 2)
1178            add_hline('</tbody>', 1)
1179        add_hline('</table>')
1180        return '\n'.join(hlines) + '\n'
1181
1182    def _do_wiki_tables(self, text):
1183        # Optimization.
1184        if "||" not in text:
1185            return text
1186
1187        less_than_tab = self.tab_width - 1
1188        wiki_table_re = re.compile(r'''
1189            (?:(?<=\n\n)|\A\n?)            # leading blank line
1190            ^([ ]{0,%d})\|\|.+?\|\|[ ]*\n  # first line
1191            (^\1\|\|.+?\|\|\n)*        # any number of subsequent lines
1192            ''' % less_than_tab, re.M | re.X)
1193        return wiki_table_re.sub(self._wiki_table_sub, text)
1194
1195    def _run_span_gamut(self, text):
1196        # These are all the transformations that occur *within* block-level
1197        # tags like paragraphs, headers, and list items.
1198
1199        text = self._do_code_spans(text)
1200
1201        text = self._escape_special_chars(text)
1202
1203        # Process anchor and image tags.
1204        if "link-patterns" in self.extras:
1205            text = self._do_link_patterns(text)
1206
1207        text = self._do_links(text)
1208
1209        # Make links out of things like `<http://example.com/>`
1210        # Must come after _do_links(), because you can use < and >
1211        # delimiters in inline links like [this](<url>).
1212        text = self._do_auto_links(text)
1213
1214        text = self._encode_amps_and_angles(text)
1215
1216        if "strike" in self.extras:
1217            text = self._do_strike(text)
1218
1219        if "underline" in self.extras:
1220            text = self._do_underline(text)
1221
1222        text = self._do_italics_and_bold(text)
1223
1224        if "smarty-pants" in self.extras:
1225            text = self._do_smart_punctuation(text)
1226
1227        # Do hard breaks:
1228        if "break-on-newline" in self.extras:
1229            text = re.sub(r" *\n", "<br%s\n" % self.empty_element_suffix, text)
1230        else:
1231            text = re.sub(r" {2,}\n", " <br%s\n" % self.empty_element_suffix, text)
1232
1233        return text
1234
1235    # "Sorta" because auto-links are identified as "tag" tokens.
1236    _sorta_html_tokenize_re = re.compile(r"""
1237        (
1238            # tag
1239            </?
1240            (?:\w+)                                     # tag name
1241            (?:\s+(?:[\w-]+:)?[\w-]+=(?:".*?"|'.*?'))*  # attributes
1242            \s*/?>
1243            |
1244            # auto-link (e.g., <http://www.activestate.com/>)
1245            <[\w~:/?#\[\]@!$&'\(\)*+,;%=\.\\-]+>
1246            |
1247            <!--.*?-->      # comment
1248            |
1249            <\?.*?\?>       # processing instruction
1250        )
1251        """, re.X)
1252
1253    def _escape_special_chars(self, text):
1254        # Python markdown note: the HTML tokenization here differs from
1255        # that in Markdown.pl, hence the behaviour for subtle cases can
1256        # differ (I believe the tokenizer here does a better job because
1257        # it isn't susceptible to unmatched '<' and '>' in HTML tags).
1258        # Note, however, that '>' is not allowed in an auto-link URL
1259        # here.
1260        escaped = []
1261        is_html_markup = False
1262        for token in self._sorta_html_tokenize_re.split(text):
1263            if is_html_markup:
1264                # Within tags/HTML-comments/auto-links, encode * and _
1265                # so they don't conflict with their use in Markdown for
1266                # italics and strong.  We're replacing each such
1267                # character with its corresponding MD5 checksum value;
1268                # this is likely overkill, but it should prevent us from
1269                # colliding with the escape values by accident.
1270                escaped.append(token.replace('*', self._escape_table['*'])
1271                                    .replace('_', self._escape_table['_']))
1272            else:
1273                escaped.append(self._encode_backslash_escapes(token))
1274            is_html_markup = not is_html_markup
1275        return ''.join(escaped)
1276
1277    def _hash_html_spans(self, text):
1278        # Used for safe_mode.
1279
1280        def _is_auto_link(s):
1281            if ':' in s and self._auto_link_re.match(s):
1282                return True
1283            elif '@' in s and self._auto_email_link_re.match(s):
1284                return True
1285            return False
1286
1287        tokens = []
1288        is_html_markup = False
1289        for token in self._sorta_html_tokenize_re.split(text):
1290            if is_html_markup and not _is_auto_link(token):
1291                sanitized = self._sanitize_html(token)
1292                key = _hash_text(sanitized)
1293                self.html_spans[key] = sanitized
1294                tokens.append(key)
1295            else:
1296                tokens.append(self._encode_incomplete_tags(token))
1297            is_html_markup = not is_html_markup
1298        return ''.join(tokens)
1299
1300    def _unhash_html_spans(self, text):
1301        for key, sanitized in list(self.html_spans.items()):
1302            text = text.replace(key, sanitized)
1303        return text
1304
1305    def _sanitize_html(self, s):
1306        if self.safe_mode == "replace":
1307            return self.html_removed_text
1308        elif self.safe_mode == "escape":
1309            replacements = [
1310                ('&', '&amp;'),
1311                ('<', '&lt;'),
1312                ('>', '&gt;'),
1313            ]
1314            for before, after in replacements:
1315                s = s.replace(before, after)
1316            return s
1317        else:
1318            raise MarkdownError("invalid value for 'safe_mode': %r (must be "
1319                                "'escape' or 'replace')" % self.safe_mode)
1320
1321    _inline_link_title = re.compile(r'''
1322            (                   # \1
1323              [ \t]+
1324              (['"])            # quote char = \2
1325              (?P<title>.*?)
1326              \2
1327            )?                  # title is optional
1328          \)$
1329        ''', re.X | re.S)
1330    _tail_of_reference_link_re = re.compile(r'''
1331          # Match tail of: [text][id]
1332          [ ]?          # one optional space
1333          (?:\n[ ]*)?   # one optional newline followed by spaces
1334          \[
1335            (?P<id>.*?)
1336          \]
1337        ''', re.X | re.S)
1338
1339    _whitespace = re.compile(r'\s*')
1340
1341    _strip_anglebrackets = re.compile(r'<(.*)>.*')
1342
1343    def _find_non_whitespace(self, text, start):
1344        """Returns the index of the first non-whitespace character in text
1345        after (and including) start
1346        """
1347        match = self._whitespace.match(text, start)
1348        return match.end()
1349
1350    def _find_balanced(self, text, start, open_c, close_c):
1351        """Returns the index where the open_c and close_c characters balance
1352        out - the same number of open_c and close_c are encountered - or the
1353        end of string if it's reached before the balance point is found.
1354        """
1355        i = start
1356        l = len(text)
1357        count = 1
1358        while count > 0 and i < l:
1359            if text[i] == open_c:
1360                count += 1
1361            elif text[i] == close_c:
1362                count -= 1
1363            i += 1
1364        return i
1365
1366    def _extract_url_and_title(self, text, start):
1367        """Extracts the url and (optional) title from the tail of a link"""
1368        # text[start] equals the opening parenthesis
1369        idx = self._find_non_whitespace(text, start+1)
1370        if idx == len(text):
1371            return None, None, None
1372        end_idx = idx
1373        has_anglebrackets = text[idx] == "<"
1374        if has_anglebrackets:
1375            end_idx = self._find_balanced(text, end_idx+1, "<", ">")
1376        end_idx = self._find_balanced(text, end_idx, "(", ")")
1377        match = self._inline_link_title.search(text, idx, end_idx)
1378        if not match:
1379            return None, None, None
1380        url, title = text[idx:match.start()], match.group("title")
1381        if has_anglebrackets:
1382            url = self._strip_anglebrackets.sub(r'\1', url)
1383        return url, title, end_idx
1384
1385    _safe_protocols = re.compile(r'(https?|ftp):', re.I)
1386    def _do_links(self, text):
1387        """Turn Markdown link shortcuts into XHTML <a> and <img> tags.
1388
1389        This is a combination of Markdown.pl's _DoAnchors() and
1390        _DoImages(). They are done together because that simplified the
1391        approach. It was necessary to use a different approach than
1392        Markdown.pl because of the lack of atomic matching support in
1393        Python's regex engine used in $g_nested_brackets.
1394        """
1395        MAX_LINK_TEXT_SENTINEL = 3000  # markdown2 issue 24
1396
1397        # `anchor_allowed_pos` is used to support img links inside
1398        # anchors, but not anchors inside anchors. An anchor's start
1399        # pos must be `>= anchor_allowed_pos`.
1400        anchor_allowed_pos = 0
1401
1402        curr_pos = 0
1403        while True:  # Handle the next link.
1404            # The next '[' is the start of:
1405            # - an inline anchor:   [text](url "title")
1406            # - a reference anchor: [text][id]
1407            # - an inline img:      ![text](url "title")
1408            # - a reference img:    ![text][id]
1409            # - a footnote ref:     [^id]
1410            #   (Only if 'footnotes' extra enabled)
1411            # - a footnote defn:    [^id]: ...
1412            #   (Only if 'footnotes' extra enabled) These have already
1413            #   been stripped in _strip_footnote_definitions() so no
1414            #   need to watch for them.
1415            # - a link definition:  [id]: url "title"
1416            #   These have already been stripped in
1417            #   _strip_link_definitions() so no need to watch for them.
1418            # - not markup:         [...anything else...
1419            try:
1420                start_idx = text.index('[', curr_pos)
1421            except ValueError:
1422                break
1423            text_length = len(text)
1424
1425            # Find the matching closing ']'.
1426            # Markdown.pl allows *matching* brackets in link text so we
1427            # will here too. Markdown.pl *doesn't* currently allow
1428            # matching brackets in img alt text -- we'll differ in that
1429            # regard.
1430            bracket_depth = 0
1431            for p in range(start_idx+1, min(start_idx+MAX_LINK_TEXT_SENTINEL,
1432                                            text_length)):
1433                ch = text[p]
1434                if ch == ']':
1435                    bracket_depth -= 1
1436                    if bracket_depth < 0:
1437                        break
1438                elif ch == '[':
1439                    bracket_depth += 1
1440            else:
1441                # Closing bracket not found within sentinel length.
1442                # This isn't markup.
1443                curr_pos = start_idx + 1
1444                continue
1445            link_text = text[start_idx+1:p]
1446
1447            # Fix for issue 341 - Injecting XSS into link text
1448            if self.safe_mode:
1449                link_text = self._hash_html_spans(link_text)
1450                link_text = self._unhash_html_spans(link_text)
1451
1452            # Possibly a footnote ref?
1453            if "footnotes" in self.extras and link_text.startswith("^"):
1454                normed_id = re.sub(r'\W', '-', link_text[1:])
1455                if normed_id in self.footnotes:
1456                    self.footnote_ids.append(normed_id)
1457                    result = '<sup class="footnote-ref" id="fnref-%s">' \
1458                             '<a href="#fn-%s">%s</a></sup>' \
1459                             % (normed_id, normed_id, len(self.footnote_ids))
1460                    text = text[:start_idx] + result + text[p+1:]
1461                else:
1462                    # This id isn't defined, leave the markup alone.
1463                    curr_pos = p+1
1464                continue
1465
1466            # Now determine what this is by the remainder.
1467            p += 1
1468            if p == text_length:
1469                return text
1470
1471            # Inline anchor or img?
1472            if text[p] == '(':  # attempt at perf improvement
1473                url, title, url_end_idx = self._extract_url_and_title(text, p)
1474                if url is not None:
1475                    # Handle an inline anchor or img.
1476                    is_img = start_idx > 0 and text[start_idx-1] == "!"
1477                    if is_img:
1478                        start_idx -= 1
1479
1480                    # We've got to encode these to avoid conflicting
1481                    # with italics/bold.
1482                    url = url.replace('*', self._escape_table['*']) \
1483                             .replace('_', self._escape_table['_'])
1484                    if title:
1485                        title_str = ' title="%s"' % (
1486                            _xml_escape_attr(title)
1487                                .replace('*', self._escape_table['*'])
1488                                .replace('_', self._escape_table['_']))
1489                    else:
1490                        title_str = ''
1491                    if is_img:
1492                        img_class_str = self._html_class_str_from_tag("img")
1493                        result = '<img src="%s" alt="%s"%s%s%s' \
1494                            % (_html_escape_url(url, safe_mode=self.safe_mode),
1495                               _xml_escape_attr(link_text),
1496                               title_str,
1497                               img_class_str,
1498                               self.empty_element_suffix)
1499                        if "smarty-pants" in self.extras:
1500                            result = result.replace('"', self._escape_table['"'])
1501                        curr_pos = start_idx + len(result)
1502                        text = text[:start_idx] + result + text[url_end_idx:]
1503                    elif start_idx >= anchor_allowed_pos:
1504                        safe_link = self._safe_protocols.match(url) or url.startswith('#')
1505                        if self.safe_mode and not safe_link:
1506                            result_head = '<a href="#"%s>' % (title_str)
1507                        else:
1508                            result_head = '<a href="%s"%s>' % (_html_escape_url(url, safe_mode=self.safe_mode), title_str)
1509                        result = '%s%s</a>' % (result_head, link_text)
1510                        if "smarty-pants" in self.extras:
1511                            result = result.replace('"', self._escape_table['"'])
1512                        # <img> allowed from curr_pos on, <a> from
1513                        # anchor_allowed_pos on.
1514                        curr_pos = start_idx + len(result_head)
1515                        anchor_allowed_pos = start_idx + len(result)
1516                        text = text[:start_idx] + result + text[url_end_idx:]
1517                    else:
1518                        # Anchor not allowed here.
1519                        curr_pos = start_idx + 1
1520                    continue
1521
1522            # Reference anchor or img?
1523            else:
1524                match = self._tail_of_reference_link_re.match(text, p)
1525                if match:
1526                    # Handle a reference-style anchor or img.
1527                    is_img = start_idx > 0 and text[start_idx-1] == "!"
1528                    if is_img:
1529                        start_idx -= 1
1530                    link_id = match.group("id").lower()
1531                    if not link_id:
1532                        link_id = link_text.lower()  # for links like [this][]
1533                    if link_id in self.urls:
1534                        url = self.urls[link_id]
1535                        # We've got to encode these to avoid conflicting
1536                        # with italics/bold.
1537                        url = url.replace('*', self._escape_table['*']) \
1538                                 .replace('_', self._escape_table['_'])
1539                        title = self.titles.get(link_id)
1540                        if title:
1541                            title = _xml_escape_attr(title) \
1542                                .replace('*', self._escape_table['*']) \
1543                                .replace('_', self._escape_table['_'])
1544                            title_str = ' title="%s"' % title
1545                        else:
1546                            title_str = ''
1547                        if is_img:
1548                            img_class_str = self._html_class_str_from_tag("img")
1549                            result = '<img src="%s" alt="%s"%s%s%s' \
1550                                % (_html_escape_url(url, safe_mode=self.safe_mode),
1551                                   _xml_escape_attr(link_text),
1552                                   title_str,
1553                                   img_class_str,
1554                                   self.empty_element_suffix)
1555                            if "smarty-pants" in self.extras:
1556                                result = result.replace('"', self._escape_table['"'])
1557                            curr_pos = start_idx + len(result)
1558                            text = text[:start_idx] + result + text[match.end():]
1559                        elif start_idx >= anchor_allowed_pos:
1560                            if self.safe_mode and not self._safe_protocols.match(url):
1561                                result_head = '<a href="#"%s>' % (title_str)
1562                            else:
1563                                result_head = '<a href="%s"%s>' % (_html_escape_url(url, safe_mode=self.safe_mode), title_str)
1564                            result = '%s%s</a>' % (result_head, link_text)
1565                            if "smarty-pants" in self.extras:
1566                                result = result.replace('"', self._escape_table['"'])
1567                            # <img> allowed from curr_pos on, <a> from
1568                            # anchor_allowed_pos on.
1569                            curr_pos = start_idx + len(result_head)
1570                            anchor_allowed_pos = start_idx + len(result)
1571                            text = text[:start_idx] + result + text[match.end():]
1572                        else:
1573                            # Anchor not allowed here.
1574                            curr_pos = start_idx + 1
1575                    else:
1576                        # This id isn't defined, leave the markup alone.
1577                        curr_pos = match.end()
1578                    continue
1579
1580            # Otherwise, it isn't markup.
1581            curr_pos = start_idx + 1
1582
1583        return text
1584
1585    def header_id_from_text(self, text, prefix, n):
1586        """Generate a header id attribute value from the given header
1587        HTML content.
1588
1589        This is only called if the "header-ids" extra is enabled.
1590        Subclasses may override this for different header ids.
1591
1592        @param text {str} The text of the header tag
1593        @param prefix {str} The requested prefix for header ids. This is the
1594            value of the "header-ids" extra key, if any. Otherwise, None.
1595        @param n {int} The <hN> tag number, i.e. `1` for an <h1> tag.
1596        @returns {str} The value for the header tag's "id" attribute. Return
1597            None to not have an id attribute and to exclude this header from
1598            the TOC (if the "toc" extra is specified).