Edit on GitHub

pdoc.search

pdoc has a search box which allows users to quickly find relevant parts in the documentation. This feature is implemented entirely client-side so that pdoc can still be hosted statically, and works without any third-party services in a privacy-preserving way. When a user focuses the search box for the first time, pdoc will fetch the search index (search.js) and use that to answer all upcoming queries.

Search Coverage

The search functionality covers all documented elements and their docstrings. You may find documentation objects using their name, arguments, or type annotations; the source code is not considered.

Search Performance

pdoc uses Elasticlunr.js to implement search. To improve end user performance, pdoc will attempt to precompile the search index when building the documentation. This only works if nodejs is available, and pdoc gracefully falls back to client-side index building if this is not the case.

If your search index reaches a size where compilation times are meaningful and nodejs cannot be invoked, pdoc will let you know and print a notice when building your documentation. In this case it should be enough to install a recent version of Node.js on your system and make a nodejs or node available on your PATH. There are no other additional dependencies. pdoc only uses node to interpret a local JS file, it does not download any additional packages.

You can test if your search index is precompiled by clicking the search box (so that the search index is fetched) and then checking your browser's developer console.

Search Index Size

The search index can be relatively large as it includes all docstrings. For larger projects, you should make sure that you have HTTP compression and caching enabled. search.js usually compresses to about 10% of its original size. For example, pdoc's own precompiled search index compresses from 312kB to 27kB.

If you wish to disable the search functionality, you can pass --no-search when invoking pdoc.

  1"""
  2pdoc has a search box which allows users to quickly find relevant parts in the documentation.
  3This feature is implemented entirely client-side so that pdoc can still be hosted statically,
  4and works without any third-party services in a privacy-preserving way. When a user focuses the
  5search box for the first time, pdoc will fetch the search index (`search.js`) and use that to
  6answer all upcoming queries.
  7
  8##### Search Coverage
  9
 10The search functionality covers all documented elements and their docstrings.
 11You may find documentation objects using their name, arguments, or type annotations; the source code is not considered.
 12
 13##### Search Performance
 14
 15pdoc uses [Elasticlunr.js](https://github.com/weixsong/elasticlunr.js) to implement search. To improve end user
 16performance, pdoc will attempt to precompile the search index when building the documentation. This only works if
 17`nodejs` is available, and pdoc gracefully falls back to client-side index building if this is not the case.
 18
 19If your search index reaches a size where compilation times are meaningful and `nodejs` cannot be invoked,
 20pdoc will let you know and print a notice when building your documentation. In this case it should be enough to install
 21a recent version of [Node.js](https://nodejs.org/) on your system and make a `nodejs` or `node` available on your PATH.
 22There are no other additional dependencies. pdoc only uses `node` to interpret a local JS file, it does not download any
 23additional packages.
 24
 25You can test if your search index is precompiled by clicking the search box (so that the search index is fetched) and
 26then checking your browser's developer console.
 27
 28##### Search Index Size
 29
 30The search index can be relatively large as it includes all docstrings. For larger projects, you should make sure that
 31you have [HTTP compression](https://en.wikipedia.org/wiki/HTTP_compression) and caching enabled. `search.js` usually
 32compresses to about 10% of its original size. For example, pdoc's own precompiled search index compresses from 312kB
 33to 27kB.
 34
 35##### Disabling Search
 36
 37If you wish to disable the search functionality, you can pass `--no-search` when invoking pdoc.
 38"""
 39from __future__ import annotations
 40
 41import html
 42
 43import json
 44import shutil
 45import subprocess
 46import textwrap
 47from collections.abc import Callable, Mapping
 48from pathlib import Path
 49
 50import pdoc.doc
 51from pdoc.render_helpers import to_html, to_markdown, format_signature
 52
 53
 54def make_index(
 55    all_modules: Mapping[str, pdoc.doc.Module],
 56    is_public: Callable[[pdoc.doc.Doc], bool],
 57    default_docformat: str,
 58) -> list[dict]:
 59    """
 60    This method compiles all currently documented modules into a pile of documentation JSON objects,
 61    which can then be ingested by Elasticlunr.js.
 62    """
 63
 64    documents = []
 65    for modname, module in all_modules.items():
 66
 67        def make_item(doc: pdoc.doc.Doc, **kwargs) -> dict[str, str]:
 68            # TODO: We could be extra fancy here and split `doc.docstring` by toc sections.
 69            ret = {
 70                "fullname": doc.fullname,
 71                "modulename": doc.modulename,
 72                "qualname": doc.qualname,
 73                "type": doc.type,
 74                "doc": to_html(to_markdown(doc.docstring, module, default_docformat)),
 75                **kwargs,
 76            }
 77            return {k: v for k, v in ret.items() if v}
 78
 79        # TODO: Instead of building our own JSON objects here we could also use module.html.jinja2's member()
 80        #  implementation to render HTML for each documentation object and then implement a elasticlunr tokenizer that
 81        #  removes HTML. It wouldn't be great for search index size, but the rendered search entries would be fully
 82        #  consistent.
 83        def make_index(mod: pdoc.doc.Namespace, **extra):
 84            if not is_public(mod):
 85                return
 86            yield make_item(mod, **extra)
 87            for m in mod.own_members:
 88                if isinstance(m, pdoc.doc.Variable) and is_public(m):
 89                    yield make_item(
 90                        m,
 91                        annotation=html.escape(m.annotation_str),
 92                        default_value=html.escape(m.default_value_str),
 93                    )
 94                elif isinstance(m, pdoc.doc.Function) and is_public(m):
 95                    if m.name == "__init__":
 96                        yield make_item(
 97                            m,
 98                            signature=format_signature(m.signature_without_self, False),
 99                        )
100                    else:
101                        yield make_item(
102                            m,
103                            signature=format_signature(m.signature, True),
104                            funcdef=m.funcdef,
105                        )
106                elif isinstance(m, pdoc.doc.Class):
107                    yield from make_index(
108                        m,
109                        bases=", ".join(x[2] for x in m.bases),
110                    )
111                else:
112                    pass
113
114        documents.extend(make_index(module))
115
116    return documents
117
118
119def precompile_index(documents: list[dict], compile_js: Path) -> str:
120    """
121    This method tries to precompile the Elasticlunr.js search index by invoking `nodejs` or `node`.
122    If that fails, an unprocessed index will be returned (which will be compiled locally on the client side).
123    If this happens and the index is rather large (>3MB), a warning with precompile instructions is printed.
124
125    We currently require nodejs, but we'd welcome PRs that support other JavaScript runtimes or
126    – even better – a Python-based search index generation similar to
127    [elasticlunr-rs](https://github.com/mattico/elasticlunr-rs) that could be shipped as part of pdoc.
128    """
129    raw = json.dumps(documents)
130    try:
131        if shutil.which("nodejs"):
132            executable = "nodejs"
133        else:
134            executable = "node"
135        out = subprocess.check_output(
136            [executable, compile_js],
137            input=raw.encode(),
138            cwd=Path(__file__).parent / "templates",
139            stderr=subprocess.STDOUT,
140        )
141        index = json.loads(out)
142        index["_isPrebuiltIndex"] = True
143    except Exception as e:
144        if len(raw) > 3 * 1024 * 1024:
145            print(
146                f"pdoc failed to precompile the search index: {e}\n"
147                f"Search will work, but may be slower. "
148                f"This error may only show up now because your index has reached a certain size. "
149                f"See https://pdoc.dev/docs/pdoc/search.html for details."
150            )
151            if isinstance(e, subprocess.CalledProcessError):
152                print(f"{' Node.js Output ':=^80}")
153                print(
154                    textwrap.indent(e.output.decode("utf8", "replace"), "    ").rstrip()
155                )
156                print("=" * 80)
157        return raw
158    else:
159        return json.dumps(index)
def make_index( all_modules: collections.abc.Mapping[str, pdoc.doc.Module], is_public: collections.abc.Callable[[pdoc.doc.Doc], bool], default_docformat: str) -> list[dict]:
 55def make_index(
 56    all_modules: Mapping[str, pdoc.doc.Module],
 57    is_public: Callable[[pdoc.doc.Doc], bool],
 58    default_docformat: str,
 59) -> list[dict]:
 60    """
 61    This method compiles all currently documented modules into a pile of documentation JSON objects,
 62    which can then be ingested by Elasticlunr.js.
 63    """
 64
 65    documents = []
 66    for modname, module in all_modules.items():
 67
 68        def make_item(doc: pdoc.doc.Doc, **kwargs) -> dict[str, str]:
 69            # TODO: We could be extra fancy here and split `doc.docstring` by toc sections.
 70            ret = {
 71                "fullname": doc.fullname,
 72                "modulename": doc.modulename,
 73                "qualname": doc.qualname,
 74                "type": doc.type,
 75                "doc": to_html(to_markdown(doc.docstring, module, default_docformat)),
 76                **kwargs,
 77            }
 78            return {k: v for k, v in ret.items() if v}
 79
 80        # TODO: Instead of building our own JSON objects here we could also use module.html.jinja2's member()
 81        #  implementation to render HTML for each documentation object and then implement a elasticlunr tokenizer that
 82        #  removes HTML. It wouldn't be great for search index size, but the rendered search entries would be fully
 83        #  consistent.
 84        def make_index(mod: pdoc.doc.Namespace, **extra):
 85            if not is_public(mod):
 86                return
 87            yield make_item(mod, **extra)
 88            for m in mod.own_members:
 89                if isinstance(m, pdoc.doc.Variable) and is_public(m):
 90                    yield make_item(
 91                        m,
 92                        annotation=html.escape(m.annotation_str),
 93                        default_value=html.escape(m.default_value_str),
 94                    )
 95                elif isinstance(m, pdoc.doc.Function) and is_public(m):
 96                    if m.name == "__init__":
 97                        yield make_item(
 98                            m,
 99                            signature=format_signature(m.signature_without_self, False),
100                        )
101                    else:
102                        yield make_item(
103                            m,
104                            signature=format_signature(m.signature, True),
105                            funcdef=m.funcdef,
106                        )
107                elif isinstance(m, pdoc.doc.Class):
108                    yield from make_index(
109                        m,
110                        bases=", ".join(x[2] for x in m.bases),
111                    )
112                else:
113                    pass
114
115        documents.extend(make_index(module))
116
117    return documents

This method compiles all currently documented modules into a pile of documentation JSON objects, which can then be ingested by Elasticlunr.js.

def precompile_index(documents: list[dict], compile_js: pathlib.Path) -> str:
120def precompile_index(documents: list[dict], compile_js: Path) -> str:
121    """
122    This method tries to precompile the Elasticlunr.js search index by invoking `nodejs` or `node`.
123    If that fails, an unprocessed index will be returned (which will be compiled locally on the client side).
124    If this happens and the index is rather large (>3MB), a warning with precompile instructions is printed.
125
126    We currently require nodejs, but we'd welcome PRs that support other JavaScript runtimes or
127    – even better – a Python-based search index generation similar to
128    [elasticlunr-rs](https://github.com/mattico/elasticlunr-rs) that could be shipped as part of pdoc.
129    """
130    raw = json.dumps(documents)
131    try:
132        if shutil.which("nodejs"):
133            executable = "nodejs"
134        else:
135            executable = "node"
136        out = subprocess.check_output(
137            [executable, compile_js],
138            input=raw.encode(),
139            cwd=Path(__file__).parent / "templates",
140            stderr=subprocess.STDOUT,
141        )
142        index = json.loads(out)
143        index["_isPrebuiltIndex"] = True
144    except Exception as e:
145        if len(raw) > 3 * 1024 * 1024:
146            print(
147                f"pdoc failed to precompile the search index: {e}\n"
148                f"Search will work, but may be slower. "
149                f"This error may only show up now because your index has reached a certain size. "
150                f"See https://pdoc.dev/docs/pdoc/search.html for details."
151            )
152            if isinstance(e, subprocess.CalledProcessError):
153                print(f"{' Node.js Output ':=^80}")
154                print(
155                    textwrap.indent(e.output.decode("utf8", "replace"), "    ").rstrip()
156                )
157                print("=" * 80)
158        return raw
159    else:
160        return json.dumps(index)

This method tries to precompile the Elasticlunr.js search index by invoking nodejs or node. If that fails, an unprocessed index will be returned (which will be compiled locally on the client side). If this happens and the index is rather large (>3MB), a warning with precompile instructions is printed.

We currently require nodejs, but we'd welcome PRs that support other JavaScript runtimes or – even better – a Python-based search index generation similar to elasticlunr-rs that could be shipped as part of pdoc.