Catalogs

Catalogs provide a way to organize, ingest, clean, and keep your data up to date. Once data is in a catalog, it can be consumed by cortexes and the intelligent content engine.

You should create a catalog for each data source. You might create one for your marketing and documentation website, one for your code examples hosted on GitHub, and another for internal go-to-market and sales materials from Google Drive. Separating your data into individual catalogs enables fine-grained control over which data sources each cortex has access to.

To create a catalog, navigate to the app, select the catalogs tab, and click "create catalog". From there you can customize the descriptions and instructions for your catalog. The description and instructions that you provide will inform the cortex on what this data set contains, and how it should be used.


Ingesting data into a catalog

Content can be ingested to catalogs both via the app, as well as through a variety of programmatic indexers availble through the SDK.

Web Scraping

Cortex Click supports web scraping to ingest entire websites such as blogs, marketing, and documentation websites. This is the most common way to ingest data into the platform. This can be done by either submitting a sitemap.xml file that links to all pages on your website, or by submitting each URL individually. The vast majority of webpages have sitemaps at example.com/sitemap.xml. The sitemap points to every published page on your website. There might be multiple sitemaps for your site, for example example.com/sitemap.xml and docs.example.com/sitemap.xml.

From the catalogs tab, navigate to the catalog you'd like to upload your website to and then click on the "upload a document" widget, and select the webscraping tab:


Hitting upload document will trigger an asynchronous upload. Scraping a large site map with hundreds or thousands of pages can take several minutes.

Manually uploading documents

From the catalogs tab navigate select the catlaog you'd like to upload to by clicking "view", and then click on the "upload a document" widget.


This will give you the option to either upload a file, or manually input text. In addition, you can specify fields like Document URL, or Image URL so that the intelligent content engine can insert citations to the document when it is used as a part of generation.

Scheduled updates

From the catalog page, you can create an indexer to automatically update your content. Indexers can run daily, weekly, or monthly depending on your needs.


Indexers currently support web scraping and we're working on adding support for more sources.

Programmatic uploads

The Cortex Click SDK provides a variety of utilities for programmtically uploading content. This is useful for uploading large amounts of content, and scheduling updates.

If you need help uploading content or keeping up to date, reach out and we'd be happy to help.

Roadmap

We're working on turning each of our programmatic indexers into one-click integrations within the Cortex App. If you need a particular integration that isn't available, we'd be happy to build it for you.