Edit on GitHub

pdoc.search

pdoc has a search box which allows users to quickly find relevant parts in the documentation. This feature is implemented entirely client-side so that pdoc can still be hosted statically, and works without any third-party services in a privacy-preserving way. When a user focuses the search box for the first time, pdoc will fetch the search index (search.js) and use that to answer all upcoming queries.

Search Coverage

The search functionality covers all documented elements and their docstrings. You may find documentation objects using their name, arguments, or type annotations; the source code is not considered.

Search Performance

pdoc uses Elasticlunr.js to implement search. To improve end user performance, pdoc will attempt to precompile the search index when building the documentation. This only works if nodejs is available, and pdoc gracefully falls back to client-side index building if this is not the case.

If your search index reaches a size where compilation times are meaningful and nodejs cannot be invoked, pdoc will let you know and print a notice when building your documentation. In this case it should be enough to install a recent version of Node.js on your system and make a nodejs or node available on your PATH. There are no other additional dependencies. pdoc only uses node to interpret a local JS file, it does not download any additional packages.

You can test if your search index is precompiled by clicking the search box (so that the search index is fetched) and then checking your browser's developer console.

Search Index Size

The search index can be relatively large as it includes all docstrings. For larger projects, you should make sure that you have HTTP compression and caching enabled. search.js usually compresses to about 10% of its original size. For example, pdoc's own precompiled search index compresses from 312kB to 27kB.

If you wish to disable the search functionality, you can pass --no-search when invoking pdoc.

  1"""
  2pdoc has a search box which allows users to quickly find relevant parts in the documentation.
  3This feature is implemented entirely client-side so that pdoc can still be hosted statically,
  4and works without any third-party services in a privacy-preserving way. When a user focuses the
  5search box for the first time, pdoc will fetch the search index (`search.js`) and use that to
  6answer all upcoming queries.
  7
  8##### Search Coverage
  9
 10The search functionality covers all documented elements and their docstrings.
 11You may find documentation objects using their name, arguments, or type annotations; the source code is not considered.
 12
 13##### Search Performance
 14
 15pdoc uses [Elasticlunr.js](https://github.com/weixsong/elasticlunr.js) to implement search. To improve end user
 16performance, pdoc will attempt to precompile the search index when building the documentation. This only works if
 17`nodejs` is available, and pdoc gracefully falls back to client-side index building if this is not the case.
 18
 19If your search index reaches a size where compilation times are meaningful and `nodejs` cannot be invoked,
 20pdoc will let you know and print a notice when building your documentation. In this case it should be enough to install
 21a recent version of [Node.js](https://nodejs.org/) on your system and make a `nodejs` or `node` available on your PATH.
 22There are no other additional dependencies. pdoc only uses `node` to interpret a local JS file, it does not download any
 23additional packages.
 24
 25You can test if your search index is precompiled by clicking the search box (so that the search index is fetched) and
 26then checking your browser's developer console.
 27
 28##### Search Index Size
 29
 30The search index can be relatively large as it includes all docstrings. For larger projects, you should make sure that
 31you have [HTTP compression](https://en.wikipedia.org/wiki/HTTP_compression) and caching enabled. `search.js` usually
 32compresses to about 10% of its original size. For example, pdoc's own precompiled search index compresses from 312kB
 33to 27kB.
 34
 35##### Disabling Search
 36
 37If you wish to disable the search functionality, you can pass `--no-search` when invoking pdoc.
 38"""
 39from __future__ import annotations
 40
 41import html
 42import json
 43import shutil
 44import subprocess
 45import textwrap
 46from collections.abc import Callable
 47from collections.abc import Mapping
 48from pathlib import Path
 49
 50import pdoc.doc
 51from pdoc.render_helpers import format_signature
 52from pdoc.render_helpers import to_html
 53from pdoc.render_helpers import to_markdown
 54
 55
 56def make_index(
 57    all_modules: Mapping[str, pdoc.doc.Module],
 58    is_public: Callable[[pdoc.doc.Doc], bool],
 59    default_docformat: str,
 60) -> list[dict]:
 61    """
 62    This method compiles all currently documented modules into a pile of documentation JSON objects,
 63    which can then be ingested by Elasticlunr.js.
 64    """
 65
 66    documents = []
 67    for modname, module in all_modules.items():
 68
 69        def make_item(doc: pdoc.doc.Doc, **kwargs) -> dict[str, str]:
 70            # TODO: We could be extra fancy here and split `doc.docstring` by toc sections.
 71            ret = {
 72                "fullname": doc.fullname,
 73                "modulename": doc.modulename,
 74                "qualname": doc.qualname,
 75                "kind": doc.kind,
 76                "doc": to_html(to_markdown(doc.docstring, module, default_docformat)),
 77                **kwargs,
 78            }
 79            return {k: v for k, v in ret.items() if v}
 80
 81        # TODO: Instead of building our own JSON objects here we could also use module.html.jinja2's member()
 82        #  implementation to render HTML for each documentation object and then implement a elasticlunr tokenizer that
 83        #  removes HTML. It wouldn't be great for search index size, but the rendered search entries would be fully
 84        #  consistent.
 85        def make_index(mod: pdoc.doc.Namespace, **extra):
 86            if not is_public(mod):
 87                return
 88            yield make_item(mod, **extra)
 89            for m in mod.own_members:
 90                if isinstance(m, pdoc.doc.Variable) and is_public(m):
 91                    yield make_item(
 92                        m,
 93                        annotation=html.escape(m.annotation_str),
 94                        default_value=html.escape(m.default_value_str),
 95                    )
 96                elif isinstance(m, pdoc.doc.Function) and is_public(m):
 97                    if m.name == "__init__":
 98                        yield make_item(
 99                            m,
100                            signature=format_signature(m.signature_without_self, False),
101                        )
102                    else:
103                        yield make_item(
104                            m,
105                            signature=format_signature(m.signature, True),
106                            funcdef=m.funcdef,
107                        )
108                elif isinstance(m, pdoc.doc.Class):
109                    yield from make_index(
110                        m,
111                        bases=", ".join(x[2] for x in m.bases),
112                    )
113                else:
114                    pass
115
116        documents.extend(make_index(module))
117
118    return documents
119
120
121def precompile_index(documents: list[dict], compile_js: Path) -> str:
122    """
123    This method tries to precompile the Elasticlunr.js search index by invoking `nodejs` or `node`.
124    If that fails, an unprocessed index will be returned (which will be compiled locally on the client side).
125    If this happens and the index is rather large (>3MB), a warning with precompile instructions is printed.
126
127    We currently require nodejs, but we'd welcome PRs that support other JavaScript runtimes or
128    – even better – a Python-based search index generation similar to
129    [elasticlunr-rs](https://github.com/mattico/elasticlunr-rs) that could be shipped as part of pdoc.
130    """
131    raw = json.dumps(documents)
132    try:
133        if shutil.which("nodejs"):
134            executable = "nodejs"
135        else:
136            executable = "node"
137        out = subprocess.check_output(
138            [executable, compile_js],
139            input=raw.encode(),
140            cwd=Path(__file__).parent / "templates",
141            stderr=subprocess.STDOUT,
142        )
143        index = json.loads(out)
144        index["_isPrebuiltIndex"] = True
145    except Exception as e:
146        if len(raw) > 3 * 1024 * 1024:
147            print(
148                f"pdoc failed to precompile the search index: {e}\n"
149                f"Search will work, but may be slower. "
150                f"This error may only show up now because your index has reached a certain size. "
151                f"See https://pdoc.dev/docs/pdoc/search.html for details."
152            )
153            if isinstance(e, subprocess.CalledProcessError):
154                print(f"{' Node.js Output ':=^80}")
155                print(
156                    textwrap.indent(e.output.decode("utf8", "replace"), "    ").rstrip()
157                )
158                print("=" * 80)
159        return raw
160    else:
161        return json.dumps(index)
def make_index( all_modules: collections.abc.Mapping[str, pdoc.doc.Module], is_public: collections.abc.Callable[[pdoc.doc.Doc], bool], default_docformat: str) -> list[dict]:
 57def make_index(
 58    all_modules: Mapping[str, pdoc.doc.Module],
 59    is_public: Callable[[pdoc.doc.Doc], bool],
 60    default_docformat: str,
 61) -> list[dict]:
 62    """
 63    This method compiles all currently documented modules into a pile of documentation JSON objects,
 64    which can then be ingested by Elasticlunr.js.
 65    """
 66
 67    documents = []
 68    for modname, module in all_modules.items():
 69
 70        def make_item(doc: pdoc.doc.Doc, **kwargs) -> dict[str, str]:
 71            # TODO: We could be extra fancy here and split `doc.docstring` by toc sections.
 72            ret = {
 73                "fullname": doc.fullname,
 74                "modulename": doc.modulename,
 75                "qualname": doc.qualname,
 76                "kind": doc.kind,
 77                "doc": to_html(to_markdown(doc.docstring, module, default_docformat)),
 78                **kwargs,
 79            }
 80            return {k: v for k, v in ret.items() if v}
 81
 82        # TODO: Instead of building our own JSON objects here we could also use module.html.jinja2's member()
 83        #  implementation to render HTML for each documentation object and then implement a elasticlunr tokenizer that
 84        #  removes HTML. It wouldn't be great for search index size, but the rendered search entries would be fully
 85        #  consistent.
 86        def make_index(mod: pdoc.doc.Namespace, **extra):
 87            if not is_public(mod):
 88                return
 89            yield make_item(mod, **extra)
 90            for m in mod.own_members:
 91                if isinstance(m, pdoc.doc.Variable) and is_public(m):
 92                    yield make_item(
 93                        m,
 94                        annotation=html.escape(m.annotation_str),
 95                        default_value=html.escape(m.default_value_str),
 96                    )
 97                elif isinstance(m, pdoc.doc.Function) and is_public(m):
 98                    if m.name == "__init__":
 99                        yield make_item(
100                            m,
101                            signature=format_signature(m.signature_without_self, False),
102                        )
103                    else:
104                        yield make_item(
105                            m,
106                            signature=format_signature(m.signature, True),
107                            funcdef=m.funcdef,
108                        )
109                elif isinstance(m, pdoc.doc.Class):
110                    yield from make_index(
111                        m,
112                        bases=", ".join(x[2] for x in m.bases),
113                    )
114                else:
115                    pass
116
117        documents.extend(make_index(module))
118
119    return documents

This method compiles all currently documented modules into a pile of documentation JSON objects, which can then be ingested by Elasticlunr.js.

def precompile_index(documents: list[dict], compile_js: pathlib.Path) -> str:
122def precompile_index(documents: list[dict], compile_js: Path) -> str:
123    """
124    This method tries to precompile the Elasticlunr.js search index by invoking `nodejs` or `node`.
125    If that fails, an unprocessed index will be returned (which will be compiled locally on the client side).
126    If this happens and the index is rather large (>3MB), a warning with precompile instructions is printed.
127
128    We currently require nodejs, but we'd welcome PRs that support other JavaScript runtimes or
129    – even better – a Python-based search index generation similar to
130    [elasticlunr-rs](https://github.com/mattico/elasticlunr-rs) that could be shipped as part of pdoc.
131    """
132    raw = json.dumps(documents)
133    try:
134        if shutil.which("nodejs"):
135            executable = "nodejs"
136        else:
137            executable = "node"
138        out = subprocess.check_output(
139            [executable, compile_js],
140            input=raw.encode(),
141            cwd=Path(__file__).parent / "templates",
142            stderr=subprocess.STDOUT,
143        )
144        index = json.loads(out)
145        index["_isPrebuiltIndex"] = True
146    except Exception as e:
147        if len(raw) > 3 * 1024 * 1024:
148            print(
149                f"pdoc failed to precompile the search index: {e}\n"
150                f"Search will work, but may be slower. "
151                f"This error may only show up now because your index has reached a certain size. "
152                f"See https://pdoc.dev/docs/pdoc/search.html for details."
153            )
154            if isinstance(e, subprocess.CalledProcessError):
155                print(f"{' Node.js Output ':=^80}")
156                print(
157                    textwrap.indent(e.output.decode("utf8", "replace"), "    ").rstrip()
158                )
159                print("=" * 80)
160        return raw
161    else:
162        return json.dumps(index)

This method tries to precompile the Elasticlunr.js search index by invoking nodejs or node. If that fails, an unprocessed index will be returned (which will be compiled locally on the client side). If this happens and the index is rather large (>3MB), a warning with precompile instructions is printed.

We currently require nodejs, but we'd welcome PRs that support other JavaScript runtimes or – even better – a Python-based search index generation similar to elasticlunr-rs that could be shipped as part of pdoc.