Upserting documents
The upsertDocuments
method adds or updates multiple documents in the catalog in a single operation. It supports various types of documents, including text, JSON, files, URLs, and sitemaps.
Supported document types
TextDocument
: For inline text or markdown contentJSONDocument
: For inline JSON contentFileDocument
: For file-based content (.docx
,.md
,.mdx
, and.txt
)UrlDocument
: For web page contentSitemapDocument
: For scraping entire sitemap URLs
Parameters
batch
:DocumentBatch
- An array of documents to be upserted. All documents in the batch must have the same content type.
Returns
A Promise
that resolves when the upsert operation is complete.
TextDocument
Upserting inline markdown:
const catalog = await client.getCatalog("github-markdown");
const docs: TextDocument[] = [
{
documentId: "1",
contentType: "markdown",
content: "# some markdown",
url: "https://foo.com",
imageUrl: "https://foo.com/image.jpg",
},
{
documentId: "2",
contentType: "markdown",
content: "# some more markdown",
url: "https://foo.com/2",
imageUrl: "https://foo.com/image2.jpg",
},
];
await catalog.upsertDocuments(docs);
Upserting inline text:
const catalog = await client.getCatalog("text-catalog");
const docs: TextDocument[] = [
{
documentId: "1",
contentType: "text",
content: "some plain text",
url: "https://foo.com",
imageUrl: "https://foo.com/image.jpg",
},
{
documentId: "2",
contentType: "text",
content: "some more plain text",
url: "https://foo.com/2",
imageUrl: "https://foo.com/image2.jpg",
},
];
await catalog.upsertDocuments(docs);
JSONDocument
JSON objects can be individually uploaded via batch upsert. For bulk JSON ingestion of JSON arrays, use the JSON indexer.
const catalog = await client.getCatalog("json");
const docs: JSONDocument[] = [
{
documentId: "1",
contentType: "json",
content: {
foo: "buzz",
a: [5, 6, 7],
},
url: "https://foo.com",
imageUrl: "https://foo.com/image.jpg",
},
{
documentId: "2",
contentType: "json",
content: {
foo: "bar",
a: [1, 2, 3],
},
url: "https://foo.com/2",
imageUrl: "https://foo.com/image2.jpg",
},
];
await catalog.upsertDocuments(docs);
FileDocument
Upload .txt
, .md
, .mdx
or .docx
files:
const docs: FileDocument[] = [
{
documentId: "1",
contentType: "file",
filePath: "./brand-guidelines.md",
url: "https://foo.com",
imageUrl: "https://foo.com/image.jpg",
},
{
documentId: "2",
contentType: "file",
filePath: "./customer-testimonials.docx",
url: "https://foo.com/2",
imageUrl: "https://foo.com/image2.jpg",
},
];
await catalog.upsertDocuments(docs);
UrlDocument
Upsert one or more URLs for web scraping. Upserting URLs returns immediately with a 202 accepted
, and scraping and indexing happens asynchronously.
const docs: UrlDocument[] = [
{
url: "https://www.cortexclick.com/",
contentType: "url",
},
];
await catalog.upsertDocuments(docs);
SitemapDocument
Upsert one or more sitemap documents to scrape and index an entire website. Sitemaps and sitemap indexes will be recursively traversed. Upserting sitemaps returns immediately with a 202 accepted
, and scraping and indexing happens asynchronously.
const docs: SitemapDocument[] = [
{
sitemapUrl: "https://www.cortexclick.com/sitemap.xml",
contentType: "sitemap-url",
},
];
await catalog.upsertDocuments(docs);