For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Sign Up
DocumentationAPI ReferenceSDKs
DocumentationAPI ReferenceSDKs
  • Fundamentals
    • Welcome
    • Quickstart
    • API Concepts
    • Document Types and Ingest Capacities
    • Filtering Content
    • Bulk Uploading in Python
  • Evaluation
    • How We Approach Testing
  • Guides
    • Prompting and Integration
    • GroundX Ingest for Parsing
    • In-Depth Exploration of GroundX Document Ingest
    • In-Depth Exploration of GroundX Search
    • MCP Support
  • GroundX On-Prem
    • GroundX On-Prem on AWS
    • GroundX On-Prem on OpenShift
    • Debugging GroundX On-Prem
Sign Up
LogoLogo
On this page
  • Observability
  • Profiling Ingest flow
  • Profiling Data
GroundX On-Prem

Debugging GroundX On-Prem

Was this page helpful?
Previous
Built with

This page discusses the general data-flow model of GroundX On-Prem, and some key approaches to debugging your GroundX On-Prem deployment.

Observability

We recommend installing a metric server, like with the following command:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

or we recommend installing a monitoring tool like prometheus. This will allow you to monitor CPU and memory usage on a per-pod and per-node basis, allowing you to profile failures due to inadequate resources.

Profiling Ingest flow

When uploading a document to GroundX On-Prem, the document data flows through the following pods before being uploaded.

groundx > upload (if the request contains a URL) >
queue > pre-process > layout-api > layout-correct >
layout-ocr + layout-inference > layout-map >
layout-save > layout-webhook > pre-process >
summary-client > summary-api > summary-inference > process

The communication between pods is by kafka topic, where the kafka topics are specified here. Below is the same flow between pods with information about kafka communication between the pods.

groundx > [kafka file-upload] >
upload (if the request contains a URL) > [kafka file-update] >
queue > [kafka file-pre-process] >
pre-process > [api request] >
layout-api > [redis-celery-queue process_queue] >
layout-process > [redis-celery-queue correct_queue] >
layout-correct > [redis-celery-queue ocr_queue + layout_queue] >
layout-ocr + layout-inference > [redis-celery-queue map_queue] >
layout-map > [redis-celery-queue save_queue] >
layout-save > [api request] >
layout-webhook > [kafka file-pre-process] >
pre-process > [kafka file-summary] >
summary-client > [api request] >
summary-api > [redis-celery-queue ] >
summary-inference >[kafka file-process] >
process

When debugging, it’s often best to start with a particular documentID. When calling the ingest endpoint, for instance, you will get a processId which can be used to retrieve documentIDs with the get_processing_status_by_id endpoint. You can then read the logs throughout the chain of pods and kafka topics in the ingest pipeline to isolate processing issues to a particular point in the pipeline. This can be used, in conjunction with resource metrics, to profile most ingestion issues. Typically, GroundX on-prem fails due to insufficient resource allocation within the ingest pipeline.

Profiling Data

GroundX On-Prem contains a mysql database, which can be accessed by running:

kubectl -n eyelevel exec -it mysql-cluster-pxc-db-pxc-0 -- bash
mysql -u DB_USER -pDB_PASSWORD eyelevel

This database contains the processor_relationships table, which shows the status of processing for a particular document. A la, for instance:

select * from processor_relationships where document_id='e139aa7a-81bb-44cb-8eb8-a4fb172835cf';

The field processor_id is an auto-incremented value, meaning it may be inconsistent on certain edge cases, but the vast majority of the time:

select * from processors where processor_id in (3, 4, 8);

results in:

  • 3 is usually the layout pods. and if it is complete, the file made it back to layout-webhook.
  • 4 is usually mapping step in the pre-process pod. if it is complete, the file made it to summary-client.
  • 8 is usually the document re-writer summary-client

These can be useful in profiling the traversal of a document throughout various pods.