DocuMan Release Notes

v2026.06.08

2026-06-08

Fix AI cabinet settings not loading saved indexes

Critical follow-up to v2026.06.07. Saving a cabinet's AI settings (including selected indexes) returned a green "AI Settings Saved" confirmation, but reopening the cabinet showed the indexes empty again. The write actually succeeded — PostgreSQL stored the JSONB array correctly — but the read path returned the value to the frontend as a raw JSON string instead of an array. The browser's Array.isArray check rejected the string and fell back to an empty list, making it look like nothing had saved.

The fix registers a JSON / JSONB codec on the asyncpg connection pool (backend/app/database.py) so reads and writes both round-trip through Python lists and dicts cleanly. The four callers that previously pre-serialized with json.dumps were updated to pass their values directly. Audit logging, login-failure tracking, the Review Queue Fix endpoint, and cabinet AI settings all share the same codec now.

No data was lost — index selections that appeared empty on reload are still in the database. After upgrading, reopen any affected cabinet and the saved indexes will display as expected.

Security Browser close now ends the session

Auth tokens now live in sessionStorage instead of localStorage. Closing the browser entirely (not just the tab) wipes the session, so reopening the app lands on the login screen. Previously the token persisted across browser restarts. The username-remember preference still uses localStorage — that's a UX convenience, not auth state. Orphan tokens left over in localStorage from earlier installs are scrubbed on next sign-out.

Improvement <MatchCheckCompany> is now a selectable index

The folder-matching token has existed in the worker and the variable autocomplete for a while, but it was missing from the index catalog, so there was no way for an operator to opt into it. Added it as a Document Fields index. Because it shares ai_field=company with <Company> and <CompanyFirstLetter>, the primary extraction call still asks for the company name exactly once. A tiny second-pass AI call fires only when the resolved template actually contains <MatchCheckCompany> — it picks the best existing sibling folder under the parent path (e.g. "Minink Materials, LLC" reuses an existing "Minink" folder instead of creating a separate one).

Improvement Help sidebar shows selected document fields

The right-side help reference in the Cabinet Properties dialog now includes a Document fields section listing every field-type index you've selected (<Company>, <CompanyFirstLetter>, <Amount>, the ID tokens, your custom indexes, …). The list stays gated to your selection — unpicked variables don't appear, so the panel still only shows what's actually wired up to resolve in templates.

Improvement Scan Time token catalog cleaned up

Removed redundant aliases from the <Now*> tokens: <NowYear>, <NowMonth>, <NowDay>, and <NowISO> are gone. Use the canonical short forms instead: <NowYYYY>, <NowMM>, <NowDD>. Templates that used the old tokens will no longer resolve them — switch to the canonical names. Every other Now token (date, time, hour, minute, second, quarter, week number, month name, day name, millis / micros / nanos, full epoch values) is unchanged.

v2026.06.07

2026-06-07

Improvement Cabinet AI dialog: per-variable index picker, independent column scroll, default Company

Refinements to the AI Watch Folder index system introduced in v2026.06.05:

Per-variable indexes. Every template variable is now its own selectable index — <Company> and <CompanyFirstLetter> are separate checkboxes, as are <DocYear>, <DocMM>, <DocQuarter> and the other date components. The Document Date section in the picker is sub-grouped (Full date forms / Year / Month / Day / Quarter & week) so the longer list stays readable. The AI request still dedupes by underlying field, so selecting many date variables still asks the model for the date exactly once.
Custom (per-cabinet) indexes. The Choose Indexes dialog has a new Custom (this cabinet) section with + Add custom index. Each custom index has a name (the template token, e.g. PatientChartNumber), a display label, and a description that becomes the AI's extraction rule. Custom variables show up in the < autocomplete on both template inputs.
Independent column scroll on the AI tab. The help sidebar on the right and the form on the left now scroll independently — reading the help reference no longer drags the form past where you were editing. The whole Cabinet Properties dialog also grew a bit (1340 px, 96% of viewport height).
Help reference filters to your selection. The Filing help on the right side only lists variables from indexes the cabinet has actually selected, with a standout callout at the top making the rule obvious. No more wading through 80 tokens to find the three you've picked.
"Document fields" help section removed. Every variable previously listed there is now an index — no point listing them twice.
Cleaner index chips. Selected indexes display as rectangular tags on the AI tab (light blue for AI-extracted, grey for computed, dashed orange for custom) instead of pill-shaped chips.
New cabinets default-select Company. The Indexes section is no longer empty on first open — Company is pre-checked and can be unchecked like any other index if the cabinet doesn't need it.
Review Queue: combined date input. No matter how many date variables a cabinet has selected, the review-queue editor shows one Document Date field; the backend re-derives year, month, day, quarter, week, etc. from that single value when you click Fix. Non-date variables each have their own editable row, and <CompanyFirstLetter> stores under its own key so you can override the derived first letter independently of the full company name.

No reconfiguration required after upgrading: migrations 064-067 re-parse each cabinet's folder + filename templates and pre-populate the matching per-variable indexes automatically. Existing v2026.06.05 installs skip the already-applied migrations and pick up only the code changes.

v2026.06.05

2026-06-05

New AI extraction "indexes" — pick exactly what the AI sees

Cabinet AI is now driven by a small list of indexes you opt into per cabinet — each index covers one logical field (Company, Document Date, Invoice Number, Amount, etc.). Only the indexes you select are requested from the AI, and only their template variables resolve in your folder + filename templates. Computed indexes (Scan Time, Import Date, Original Filename) are always available with no AI cost.

New fields: Invoice Number, PO Number, Account Number, Reference / Tracking Number, Customer / Patient ID, Vendor Code, Description / Summary, Subject / Title, and Currency — with matching <InvoiceNumber>, <PONumber>, etc. template variables.
Choose Indexes… dialog: a new picker (Cabinet Properties → AI Auto-Import → Indexes) lets you check-mark exactly which indexes this cabinet uses. Selected indexes appear as chips on the AI tab.
Smarter Document Date: the AI is only asked for the date once (as ref_date); every Doc* subfield (<DocYear>, <DocQuarter>, <DocWeekNumber>, …) and all smart-date variants are derived locally. No tokens get to the AI that don't need to.
Cleaner system prompt: the JSON response schema is generated automatically from your selected indexes and appended to the prompt. The prompt textarea is now just for describing what kind of documents this cabinet files — no more hand-maintaining the JSON shape.
Automatic migration: existing cabinets get their selected_indexes pre-populated by parsing the tokens already in use in their folder + filename templates. No manual reconfiguration after the update.

New Editable indexes + "Fix" on the Review Queue

When the AI's extraction needs a correction, you can now edit the values inline on each pending review-queue item and click Fix to re-run the folder + filename templates with the corrected values — no extra AI call. The filename is always editable independently, and any date you correct is shown as a single date input (the worker derives year, month, quarter, week-number, … locally).

After you approve a document, every index value is mirrored into its document tags so the same data stays queryable on the document itself long after the review-queue row is gone.

v2026.06.03

2026-06-03

New Smarter AI Watch Folder template variables

The folder- and filename-template fields now know about a much wider set of variables, an autocomplete that surfaces them as you type, and an inline help panel that documents every one with an example.

IntelliType (< autocomplete) — type < in either template field and a menu of every supported variable appears, filtered as you keep typing (<Doc narrows to the document-date set, etc.). Arrow keys + Enter to insert, Escape to cancel.
Richer document-date set — <DocYYYY>, <DocYY>, <DocMM>, <DocDD>, <DocMonthName>, <DocDayName>, <DocQuarter>, <DocWeekNumber>, and <DocISO> alongside the existing <DocDate> family.
Scan-time ("Now") variables with nanosecond precision — <Now>, <NowDate>, <NowTime>, <NowISO>, all the date/time component tokens (<NowYYYY>, <NowMM>, <NowHH>, …), plus <NowMillis> / <NowMicros> / <NowNanos> and full epoch values up to <NowEpochNanos> for unique-by-time filenames.
<MatchCheckCompany> — a new variable that fuzzy-matches the AI-extracted company name against existing subfolders. Example template /Packing Slips/<CompanyFirstLetter>/<MatchCheckCompany> with an AI-extracted company of "Minink Materials, LLC" files into the existing Minink folder instead of creating a separate "Minink Materials, LLC" folder. A strict deterministic second AI call picks from the candidate list, with post-filtering so the model can't return a name that wasn't offered. Triggers only when this variable is in the template — no extra AI cost otherwise.

Improvement AI cabinet settings can't be configured until enabled

Every field on a cabinet's AI Auto-Import tab (watch folders, templates, prompt, auto-approve, vision settings) is now disabled until the Enable AI Auto-Import toggle is on. A small notice points at the toggle. This makes the "fill everything in but forget to flip the switch" mistake impossible.

Improvement Watch folder root mode is no longer a footgun

Creating a Watch Folder without picking a target folder used to silently file loose files into an auto-created Inbox folder — with no UI hint that this would happen. Administration → Watch Folders now:

Shows a clear banner in the add-folder dialog explaining that root-mode files go to an auto-created Inbox folder in the cabinet and that subdirectories become top-level folders.
In the watch-folder list, root-mode rows show "→ Inbox (auto)" as their target, with a tooltip describing the behavior.

Fix Folder permission inheritance UI

The folder-properties Permissions tab had several bugs that combined to make the "Inherit permissions from parent" checkbox feel broken. All addressed in this release:

Route shadowing bug — the /permissions/folder-inheritance/{id} endpoint was being shadowed by a more-general catch-all route registered first, so the API silently returned an empty list. This made the checkbox always appear unchecked regardless of the database state. Routes are now ordered so specific paths win.
Inherited permissions are now displayed in the folder properties (Windows-Explorer style). When a folder inherits, the permission table shows the inherited rows directly, dimmed and read-only, with a caption that says "Showing permissions inherited from cabinet 'X'." No more clicking back up to the parent to find out what's in effect.
Indeterminate checkbox — the checkbox could render as a "—" dash with no apparent way to interact with it. The state is now coerced to a clean boolean and Quasar's tri-state click cycle is disabled.
"Break inheritance" Cancel button — the confirmation dialog's Cancel button used to mean "no, don't copy parent perms — but still break inheritance," which was a trap. The flow is now two confirmations: the first asks whether to break inheritance at all (Cancel is a true no-op), the second chooses the starting permission set.

Fix Spurious "permission denied" toast on file rename

Non-admin users whose access came via group membership saw a red "You do not have permission to perform this action" notification when they opened a document's Properties dialog, even though the rename itself worked. The cause was a separate /admin/users + /admin/groups request fired to populate permission-picker dropdowns — admin-only endpoints that 403 for non-admins, tripping the global error toast. Those requests are now skipped for non-admin users (the dropdowns are only useful to admins anyway).

v2026.05.28

2026-05-28

Improvement Cleaner Settings page

The admin Settings page no longer duplicates options that already have their own dedicated pages. Those settings are still fully editable — just from a single place each — which removes the risk of two editors disagreeing about the same value. The general Settings table now stays focused on truly general options (OCR, uploads, session timeout, and the like).

Backup settings (destination, schedule, retention, pg_dump path) are managed only on Administration → Backups.
Consistency-check settings (schedule, email-on-failure, pg_amcheck path) are managed only on Administration → Consistency.
Directory / single sign-on settings (LDAP, Active Directory, and Microsoft Entra ID / Azure) are managed only on Administration → Active Directory.
These now follow the same rule the SSL/certificate settings already did — if a setting is owned by a dedicated page, it no longer appears in the general Settings table.

v2026.05.27

2026-05-27

Security Permission hardening

This release tightens how per-user permissions are enforced across cabinet visibility, search, and login. Upgrading is recommended for all installations.

Cabinet visibility — users no longer see the names of cabinets they have no permission on. This applies to the left navigation tree, the search-page cabinet filter, and the AI Review Queue cabinet filter.
Search — full-text and advanced search results are now strictly scoped to the caller's permissions. Hits from cabinets the user has no access to are no longer returned, even when a cabinet or folder ID is supplied directly.
Login — a valid Active Directory account alone is no longer sufficient to obtain a session. Users must be a system administrator or have at least one cabinet permission (direct or via group membership) for the login to succeed. Rejected attempts are recorded in the audit log.
Per-login group sync (AD & Azure SSO) — every successful sign-in now re-reads the user's directory group memberships and reconciles them in DocuMan. Adding a user to a permitted group in AD takes effect on their very next sign-in; removing them revokes access immediately. Manually-managed (non-AD) group memberships are untouched.
UI affordances — the "New Cabinet" and "New Folder" buttons are now greyed out for users without the corresponding permission, and "Review Queue" / "Folder Amounts" are hidden when not applicable.

No configuration changes are required after upgrading. Existing permission grants continue to work; the gates above simply enforce them where they previously weren't.

Performance Faster search for non-admin users

On large document collections, search for non-admin users could take 20-30 seconds even for queries that returned only a handful of results. The cause was the per-row permission check expanding into a sequential scan of the entire document table instead of being evaluated only on the matching candidates. Two changes:

Search now computes the user's set of accessible folders once per query via a new fn_accessible_folder_ids helper and filters by set membership, instead of calling the per-folder permission check for every row. On a 21,500-document dataset this turns a ~29s search into a sub-second one and scales linearly with the matching result set rather than the table size.
The highlighted result snippet is now generated only for the documents actually shown on the current page (typically 50) instead of for every document that matched the query.

The change is rule-for-rule equivalent to the previous permission model: direct grants, group grants, folder-to-folder inheritance, cabinet-to-folder inheritance, and permissions_inherited=FALSE blocks all behave the same. No configuration changes required.

Fix OCR no longer silently drops pages

Previously, when OCR produced no text for a page (a blank scan or a page Tesseract couldn't read), that page was dropped entirely and the document was still marked fully processed — so a 300-page PDF could end up with only 254 searchable pages and no indication anything was missing. Now:

Every page is recorded, including blank ones, so coverage gaps are tracked instead of hidden.
File Properties shows OCR coverage (e.g. "254 / 300 pages (85%)") and a Partial badge when a document didn't fully OCR.
Right-click a document for Re-OCR Document (full re-process) or Re-OCR Missing Pages, which retries only the blank pages at a higher resolution — far cheaper than re-processing the whole file.

v2026.05.25

2026-05-25

New Database consistency self-check

A new admin page (Administration → Consistency) runs four complementary read-only health checks on the database, on a schedule you configure, and surfaces the results in the admin UI.

VACUUM ANALYZE per table — catches heap-tuple xmin/xmax inconsistencies (the class of corruption that can hide for days after a crash).
pg_amcheck --heapallindexed — B-tree index integrity, including entries pointing at non-existent heap tuples.
TOAST integrity via pg_dump — the gold-standard test for missing TOAST chunks (large-value storage). A successful dump means every row could be fully materialised.
FK orphan scan — DocuMan-specific check across seven relationships (orphan pages, tags, notes, review-queue rows, etc.) that don't have CASCADE constraints.

Schedule it daily or weekly — weekly is recommended (the TOAST scan is I/O-heavy on large databases). A "Run Check Now" button is always available for ad-hoc checks. Each run records pass/fail per check with summaries, viewable in a per-row Details dialog. Optional email-on-failure target.

Fix Cross-cabinet folder move

Moving a folder to a different cabinet via right-click → Move To… previously failed with a generic 400 error. The underlying SQL function referenced a column (updated_by) that had never been added to the folders table. This release adds the column, backfills historical rows from created_by, and patches every folder-mutating function to populate it consistently — completing the created_by/updated_by/deleted_by audit trio.

New PostgreSQL tuning admin page

A new Administration → PostgreSQL Tuning page exposes the five memory and planner parameters that most affect search and OCR throughput (shared_buffers, effective_cache_size, work_mem, maintenance_work_mem, random_page_cost). The page shows the current value, a recommended value computed from the host's RAM (with a 10 GB reserve for the OS and DocuMan's own services), and an Apply button that runs ALTER SYSTEM SET for each parameter and reloads PostgreSQL. When the docman database role isn't a superuser, the page returns the exact psql commands for the operator to run as the postgres superuser instead. The defaults that ship with the PostgreSQL installer (128 MB shared_buffers, random_page_cost = 4.0) are tuned for tiny databases on spinning disks — not for a real DocuMan corpus.

Improvement Search speed on large databases

Searches for common terms on large databases (25 000+ documents, 1+ million pages) were taking 10+ seconds. The cause was an OR clause that spanned two tables in the search function: PostgreSQL can't combine GIN bitmaps from two different relations, so it fell back to a single-pass plan that re-computed a tsvector on every document row. Both search functions are rewritten to gather candidate documents via a UNION of single-table indexed lookups (each leg now uses its own GIN index), then narrow to the candidate set, then pick the best matching page per document for the snippet. The unused substring-match legs are pruned from the plan entirely via SET LOCAL plan_cache_mode = force_custom_plan. On large corpora this turns multi-second searches into sub-second — especially when combined with the new PostgreSQL tuning recommendations above.