In-Depth Exploration of GroundX Document Ingest
In-Depth Exploration of GroundX Document Ingest
In-Depth Exploration of GroundX Document Ingest
In this tutorial, we’ll cover how to add, or “ingest”, your files to GroundX.
With our proprietary ingest pipeline, your files undergo three critical processes:
Unlike other RAG solutions that require you to convert your files into plain text, Ground X is compatible with a wide variety of file formats out of the box, allowing you to expose your document data directly to an LLM without custom configuration.
more information about document parsing can be found in our guide on GroundX Ingest for Parsing. In this article, we’ll focus on the ingest pipeline in general; how to ingest local files, remotely hosted files, directories, etc.
Before we begin, make sure you have the following information:
You may also want to prepare the following optional values:
Example:
Now that we have a GroundX bucket we can upload content to, we can explore how ingest functions in GroundX. The simplest way to ingest content into GroundX is by uploading files one at a time.
First, you’ll need to set up authentication with the GroundX client.
Once you’ve authenticated your client, you can ingest a document into GroundX via the ingest endpoint
The file_path specified in the ingest endpoint can either be that of a local path or a public URL.
After making the request, you should receive a response with processId and status. This response indicates that GroundX is uploading or ingesting your file into the indicated bucket.
the processId can be polled to get the most up-to-date upload status via the documents.get_processing_status_by_id endpoint.
if you’re using the Python SDK, you can use the method ingest_directory to ingest the contents of a directory to a particular bucket.
This is a function that asynchronously batch uploads all of the documents within a directory tree, based on the top level path specified. It will render a tqdm progress bar, and automatically poll for updates on the batch currently being uploaded.
GroundX automatically generates contextual search data for your files. However, you can add extra search data to take maximum advantage of GroundX’s search capabilities, help maintain document context in the search query responses, and add tags or notes indicating instructions on how to handle the search results.
Example:
Processing time depends on the size of your files. For upload restrictions like file and batch size, see the prompting and integration guide.
After automatically ingesting your files and eliminating the typical complexity of other RAG solutions, GroundX has prepared your content for searchability and automated response generation for your queries.