Edit on GitHub

pdoc.search

pdoc has a search box which allows users to quickly find relevant parts in the documentation. This feature is implemented entirely client-side so that pdoc can still be hosted statically, and works without any third-party services in a privacy-preserving way. When a user focuses the search box for the first time, pdoc will fetch the search index (search.js) and use that to answer all upcoming queries.

Single-Page Documentation

If pdoc is documenting a single module only, search functionality will be disabled. The browser's built-in search functionality will provide a better user experience in these cases.

Search Coverage

The search functionality covers all documented elements and their docstrings. You may find documentation objects using their name, arguments, or type annotations; the source code is not considered.

Search Performance

pdoc uses Elasticlunr.js to implement search. To improve end user performance, pdoc will attempt to precompile the search index when building the documentation. This only works if nodejs is available, and pdoc gracefully falls back to client-side index building if this is not the case.

If your search index reaches a size where compilation times are meaningful and nodejs cannot be invoked, pdoc will let you know and print a notice when building your documentation. In this case it should be enough to install a recent version of Node.js on your system and make a nodejs or node available on your PATH. There are no other additional dependencies. pdoc only uses node to interpret a local JS file, it does not download any additional packages.

You can test if your search index is precompiled by clicking the search box (so that the search index is fetched) and then checking your browser's developer console.

Search Index Size

The search index can be relatively large as it includes all docstrings. For larger projects, you should make sure that you have HTTP compression and caching enabled. search.js usually compresses to about 10% of its original size. For example, pdoc's own precompiled search index compresses from 312kB to 27kB.

If you wish to disable the search functionality, you can pass --no-search when invoking pdoc.

  1"""
  2pdoc has a search box which allows users to quickly find relevant parts in the documentation.
  3This feature is implemented entirely client-side so that pdoc can still be hosted statically,
  4and works without any third-party services in a privacy-preserving way. When a user focuses the
  5search box for the first time, pdoc will fetch the search index (`search.js`) and use that to
  6answer all upcoming queries.
  7
  8##### Single-Page Documentation
  9
 10If pdoc is documenting a single module only, search functionality will be disabled.
 11The browser's built-in search functionality will provide a better user experience in these cases.
 12
 13##### Search Coverage
 14
 15The search functionality covers all documented elements and their docstrings.
 16You may find documentation objects using their name, arguments, or type annotations; the source code is not considered.
 17
 18##### Search Performance
 19
 20pdoc uses [Elasticlunr.js](https://github.com/weixsong/elasticlunr.js) to implement search. To improve end user
 21performance, pdoc will attempt to precompile the search index when building the documentation. This only works if
 22`nodejs` is available, and pdoc gracefully falls back to client-side index building if this is not the case.
 23
 24If your search index reaches a size where compilation times are meaningful and `nodejs` cannot be invoked,
 25pdoc will let you know and print a notice when building your documentation. In this case it should be enough to install
 26a recent version of [Node.js](https://nodejs.org/) on your system and make a `nodejs` or `node` available on your PATH.
 27There are no other additional dependencies. pdoc only uses `node` to interpret a local JS file, it does not download any
 28additional packages.
 29
 30You can test if your search index is precompiled by clicking the search box (so that the search index is fetched) and
 31then checking your browser's developer console.
 32
 33##### Search Index Size
 34
 35The search index can be relatively large as it includes all docstrings. For larger projects, you should make sure that
 36you have [HTTP compression](https://en.wikipedia.org/wiki/HTTP_compression) and caching enabled. `search.js` usually
 37compresses to about 10% of its original size. For example, pdoc's own precompiled search index compresses from 312kB
 38to 27kB.
 39
 40##### Disabling Search
 41
 42If you wish to disable the search functionality, you can pass `--no-search` when invoking pdoc.
 43"""
 44
 45from __future__ import annotations
 46
 47from collections.abc import Callable
 48from collections.abc import Mapping
 49import functools
 50import html
 51import json
 52from pathlib import Path
 53import shutil
 54import subprocess
 55import textwrap
 56
 57import pdoc.doc
 58from pdoc.render_helpers import format_signature
 59from pdoc.render_helpers import to_html
 60from pdoc.render_helpers import to_markdown
 61
 62
 63def make_index(
 64    all_modules: Mapping[str, pdoc.doc.Module],
 65    is_public: Callable[[pdoc.doc.Doc], bool],
 66    default_docformat: str,
 67) -> list[dict]:
 68    """
 69    This method compiles all currently documented modules into a pile of documentation JSON objects,
 70    which can then be ingested by Elasticlunr.js.
 71    """
 72
 73    documents = []
 74    for modname, module in all_modules.items():
 75
 76        def make_item(doc: pdoc.doc.Doc, **kwargs) -> dict[str, str]:
 77            # TODO: We could be extra fancy here and split `doc.docstring` by toc sections.
 78            ret = {
 79                "fullname": doc.fullname,
 80                "modulename": doc.modulename,
 81                "qualname": doc.qualname,
 82                "kind": doc.kind,
 83                "doc": to_html(to_markdown(doc.docstring, module, default_docformat)),
 84                **kwargs,
 85            }
 86            return {k: v for k, v in ret.items() if v}
 87
 88        # TODO: Instead of building our own JSON objects here we could also use module.html.jinja2's member()
 89        #  implementation to render HTML for each documentation object and then implement a elasticlunr tokenizer that
 90        #  removes HTML. It wouldn't be great for search index size, but the rendered search entries would be fully
 91        #  consistent.
 92        def make_index(mod: pdoc.doc.Namespace, **extra):
 93            if not is_public(mod):
 94                return
 95            yield make_item(mod, **extra)
 96            for m in mod.own_members:
 97                if isinstance(m, pdoc.doc.Variable) and is_public(m):
 98                    yield make_item(
 99                        m,
100                        annotation=html.escape(m.annotation_str),
101                        default_value=html.escape(m.default_value_str),
102                    )
103                elif isinstance(m, pdoc.doc.Function) and is_public(m):
104                    if m.name == "__init__":
105                        yield make_item(
106                            m,
107                            signature=format_signature(m.signature_without_self, False),
108                        )
109                    else:
110                        yield make_item(
111                            m,
112                            signature=format_signature(m.signature, True),
113                            funcdef=m.funcdef,
114                        )
115                elif isinstance(m, pdoc.doc.Class):
116                    yield from make_index(
117                        m,
118                        bases=", ".join(x[2] for x in m.bases),
119                    )
120                else:
121                    pass
122
123        documents.extend(make_index(module))
124
125    return documents
126
127
128@functools.cache
129def node_executable() -> str | None:
130    if shutil.which("nodejs"):
131        return "nodejs"
132    elif shutil.which("node"):
133        return "node"
134    else:
135        return None
136
137
138def precompile_index(documents: list[dict], compile_js: Path) -> str:
139    """
140    This method tries to precompile the Elasticlunr.js search index by invoking `nodejs` or `node`.
141    If that fails, an unprocessed index will be returned (which will be compiled locally on the client side).
142    If this happens and the index is rather large (>3MB), a warning with precompile instructions is printed.
143
144    We currently require nodejs, but we'd welcome PRs that support other JavaScript runtimes or
145    – even better – a Python-based search index generation similar to
146    [elasticlunr-rs](https://github.com/mattico/elasticlunr-rs) that could be shipped as part of pdoc.
147    """
148    raw = json.dumps(documents)
149    try:
150        node = node_executable()
151        if node is None:
152            raise FileNotFoundError("No such file or directory: 'node'")
153        out = subprocess.check_output(
154            [node, compile_js],
155            input=raw.encode(),
156            cwd=Path(__file__).parent / "templates",
157            stderr=subprocess.STDOUT,
158        )
159        index = json.loads(out)
160        index["_isPrebuiltIndex"] = True
161    except Exception as e:
162        if len(raw) > 3 * 1024 * 1024:
163            print(
164                f"pdoc failed to precompile the search index: {e}\n"
165                f"Search will work, but may be slower. "
166                f"This error may only show up now because your index has reached a certain size. "
167                f"See https://pdoc.dev/docs/pdoc/search.html for details."
168            )
169            if isinstance(e, subprocess.CalledProcessError):
170                print(f"{' Node.js Output ':=^80}")
171                print(
172                    textwrap.indent(e.output.decode("utf8", "replace"), "    ").rstrip()
173                )
174                print("=" * 80)
175        return raw
176    else:
177        return json.dumps(index)
def make_index( all_modules: Mapping[str, pdoc.doc.Module], is_public: Callable[[pdoc.doc.Doc], bool], default_docformat: str) -> list[dict]:
 64def make_index(
 65    all_modules: Mapping[str, pdoc.doc.Module],
 66    is_public: Callable[[pdoc.doc.Doc], bool],
 67    default_docformat: str,
 68) -> list[dict]:
 69    """
 70    This method compiles all currently documented modules into a pile of documentation JSON objects,
 71    which can then be ingested by Elasticlunr.js.
 72    """
 73
 74    documents = []
 75    for modname, module in all_modules.items():
 76
 77        def make_item(doc: pdoc.doc.Doc, **kwargs) -> dict[str, str]:
 78            # TODO: We could be extra fancy here and split `doc.docstring` by toc sections.
 79            ret = {
 80                "fullname": doc.fullname,
 81                "modulename": doc.modulename,
 82                "qualname": doc.qualname,
 83                "kind": doc.kind,
 84                "doc": to_html(to_markdown(doc.docstring, module, default_docformat)),
 85                **kwargs,
 86            }
 87            return {k: v for k, v in ret.items() if v}
 88
 89        # TODO: Instead of building our own JSON objects here we could also use module.html.jinja2's member()
 90        #  implementation to render HTML for each documentation object and then implement a elasticlunr tokenizer that
 91        #  removes HTML. It wouldn't be great for search index size, but the rendered search entries would be fully
 92        #  consistent.
 93        def make_index(mod: pdoc.doc.Namespace, **extra):
 94            if not is_public(mod):
 95                return
 96            yield make_item(mod, **extra)
 97            for m in mod.own_members:
 98                if isinstance(m, pdoc.doc.Variable) and is_public(m):
 99                    yield make_item(
100                        m,
101                        annotation=html.escape(m.annotation_str),
102                        default_value=html.escape(m.default_value_str),
103                    )
104                elif isinstance(m, pdoc.doc.Function) and is_public(m):
105                    if m.name == "__init__":
106                        yield make_item(
107                            m,
108                            signature=format_signature(m.signature_without_self, False),
109                        )
110                    else:
111                        yield make_item(
112                            m,
113                            signature=format_signature(m.signature, True),
114                            funcdef=m.funcdef,
115                        )
116                elif isinstance(m, pdoc.doc.Class):
117                    yield from make_index(
118                        m,
119                        bases=", ".join(x[2] for x in m.bases),
120                    )
121                else:
122                    pass
123
124        documents.extend(make_index(module))
125
126    return documents

This method compiles all currently documented modules into a pile of documentation JSON objects, which can then be ingested by Elasticlunr.js.

@functools.cache
def node_executable() -> str | None:
129@functools.cache
130def node_executable() -> str | None:
131    if shutil.which("nodejs"):
132        return "nodejs"
133    elif shutil.which("node"):
134        return "node"
135    else:
136        return None
def precompile_index(documents: list[dict], compile_js: pathlib.Path) -> str:
139def precompile_index(documents: list[dict], compile_js: Path) -> str:
140    """
141    This method tries to precompile the Elasticlunr.js search index by invoking `nodejs` or `node`.
142    If that fails, an unprocessed index will be returned (which will be compiled locally on the client side).
143    If this happens and the index is rather large (>3MB), a warning with precompile instructions is printed.
144
145    We currently require nodejs, but we'd welcome PRs that support other JavaScript runtimes or
146    – even better – a Python-based search index generation similar to
147    [elasticlunr-rs](https://github.com/mattico/elasticlunr-rs) that could be shipped as part of pdoc.
148    """
149    raw = json.dumps(documents)
150    try:
151        node = node_executable()
152        if node is None:
153            raise FileNotFoundError("No such file or directory: 'node'")
154        out = subprocess.check_output(
155            [node, compile_js],
156            input=raw.encode(),
157            cwd=Path(__file__).parent / "templates",
158            stderr=subprocess.STDOUT,
159        )
160        index = json.loads(out)
161        index["_isPrebuiltIndex"] = True
162    except Exception as e:
163        if len(raw) > 3 * 1024 * 1024:
164            print(
165                f"pdoc failed to precompile the search index: {e}\n"
166                f"Search will work, but may be slower. "
167                f"This error may only show up now because your index has reached a certain size. "
168                f"See https://pdoc.dev/docs/pdoc/search.html for details."
169            )
170            if isinstance(e, subprocess.CalledProcessError):
171                print(f"{' Node.js Output ':=^80}")
172                print(
173                    textwrap.indent(e.output.decode("utf8", "replace"), "    ").rstrip()
174                )
175                print("=" * 80)
176        return raw
177    else:
178        return json.dumps(index)

This method tries to precompile the Elasticlunr.js search index by invoking nodejs or node. If that fails, an unprocessed index will be returned (which will be compiled locally on the client side). If this happens and the index is rather large (>3MB), a warning with precompile instructions is printed.

We currently require nodejs, but we'd welcome PRs that support other JavaScript runtimes or – even better – a Python-based search index generation similar to elasticlunr-rs that could be shipped as part of pdoc.