Performance Tuning in the Oracle Autonomous Database: Creating Indexes for JSON Collections

By mullaned2002

February 14, 2024

161

The Oracle Database and in particular the Oracle Autonomous Database offers multiple technologies to accelerate queries and improve database performance over JSON data, including indexes, materialized views, in-memory column storage, and Exadata storage-cell pushdown.

Simple Oracle Document Access (SODA) is a NoSQL query language for storing, editing and deleting JSON documents. A SODA collection is a set of documents (in particular JSON) that is backed by an Oracle Database table or view. By default, creating a SODA document collection creates the following in the Oracle Database:

Persistent default collection metadata.

A table for storing the collection, in the database schema to which your SODA client is connected.

If the init.ora parameter compatible is at least 20, then SODA uses JSON data type by default for JSON content, and the version method is UUID. If the init.ora parameter compatible is less than 20, then SODA uses BLOB textual data by default for JSON content and the data is character data.

Collection metadata is composed of multiple components. The kind of database you use determines the collection metadata that is used by default, and which of its field values you can modify for custom metadata.

Let us consider a collection of movies within the ADMIN schema in a 19c Oracle Autonomous JSON Database (think of one JSON file per movie). After opening JSON from Database Actions, we will create an index on the MOVIES colection:

Creating an index for a JSON collection requires the name and the type. The different options are Functional, Spatial and Search.

Before creating a functional index on the runtime field (we will be running order-by queries against it), let us see what are the field needed (fields to enter):

For a functional type index, the fields to enter are:

Unique: make all indexed values unique

Index Nulls: use the index in order-by queries

Path Required: the path must select a scalar value, even a JSON null value

Properties: select the property that you want to index on, or easier is to just type * to display all available document properties in the collection

Composite Index: use more than one property

Advanced: change the storage properties of the indexed property

For search index, the options are:

Dataguide off-on: create JSON data guide for collection

Text Search off-on: index all properties in documents to support full-text search based on string equality

Range Search off-on: choose on to support range search when string-range search or temporal search (equality or range) is required

Spatial index is used to index GeoJSON geographic data. The selected property should be of GeoJSON type. See Using GeoJSON Geographic Data. For spatial index, the options are:

Path Required: the path must select a value, even if it is a JSON null value

Lax: the targeted field does not need to be present or does not have a GeoJSON geometry object as its value

The properties of the selected index appear in JSON format below the listed indexes:

We can also index a singleton scalar field using SQL, here is how:

CREATE INDEX YEAR_NDX ON movies (json_value(data, ‘$.year.number()’ ERROR ON ERROR));

Item method numberOnly() is used in the path expression that identifies the field to index, to ensure that the field value is numeric. As I have the year field in one of the JSON files as a string, I am getting the following error if I use the method numberOnly(): ORA-01722: invalid number. Method numberOnly() is used instead of method number(), because number() allows also for conversion of non-numeric fields to numbers. Clearly as some moview are produced during the same year I cannot make the index unique: ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found.

Creating a collection relational views of JSON documents is simple, select the columns we need, say title, year and runtime:

In SQL, we can use the view and order the movies by runtime:

Regardless of your database release you can create whatever Oracle Database indexes you need directly, using:

(1) the JSON Page of Using Oracle Database Actions, check Creating Indexes for JSON Collections

(2) Simple Oracle Document Access (SODA)

(3) SQL — see Indexes for JSON Data in Oracle Database JSON Developer’s Guide.

Using the JSON page in ADB is perhaps the easiest approach to indexing JSON data.

The static data dictionary views USER_SODA_COLLECTIONS lists the basic features of all of your SODA collections. Correspondently, you have also DBA_SODA_COLLECTIONS and ALL_SODA_COLLECTIONS. Collection metadata, expressed in JavaScript Object Notation (JSON) can be obtained from the last column, called JSON_DESCRIPTOR.

Users will typically work with JSON collections using native language drivers, for example, SODA for Java or SODA for Python. SODA native language drivers generally provide more throughput (operations per second) than the REST driver (SODA for REST). It is recommended to configure the SODA drivers as follows:

Enable SODA Metadata Cache: The SODA driver needs to know the metadata of each JSON collection (the column names, types, etc.). By enabling the metadata cache, roundtrips to the database can be saved, improving latency and throughput.

Enable Statement Cache: Statement caching improves performance by caching executable statements that are used repeatedly, such as in a loop or in a method that is called repeatedly. For Java, the statement cache is enabled using JDBC.

For load-balanced systems: turn off DNS caching: Load balancing allows to distribute SODA operations across different nodes. If DNS caching is turned on, then all connections are likely to use the same node and nullifying the load balancing. For Java, the following system property should be set: inet.addr.ttl=0

The database performance tuning techniques also apply to SODA: for example, SODA collections can be partitioned or sharded, and queries can be accelerated using indexes and/or materialized views.

How to monitor all that? You can turn on performance monitoring of the SQL operations that underlie a SODA read or write operation, by adding a SQL hint to the SODA operation.

Use only hint MONITOR (turn on monitoring) or NO_MONITOR (turn off monitoring). You can use this to pass any SQL hints, but MONITOR and NO_MONITOR are the useful ones for SODA, and an inappropriate hint can cause the optimizer to produce a suboptimal query plan.

Note in addition that when using SODA with Autonomous Database the following restrictions apply:

Automatic indexing is not supported for SQL and PL/SQL code that uses the SQL/JSON function json_exists. See SQL/JSON Condition JSON_EXISTS for more information but note that all the collection APIs use JSON_EXISTS for filtering so … the benefits might be negligible.

Automatic indexing is not supported for SODA query-by-example (QBE)

Final note: since Auto Indexing is disabled by default in the Autonomous Database, you can enable it by running: EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_MODE’,’IMPLEMENT’);

Read MoreJulian Dontcheff’s Database Blog

Previous articleSkeleton-based pose annotation labeling using Amazon SageMaker Ground Truth

Next articleImmersive Stream for XR: Performance and savings with NVIDIA L4 GPU and autoscaling

Performance Tuning in the Oracle Autonomous Database: Creating Indexes for JSON Collections

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Implement UUIDv7 in Amazon RDS for PostgreSQL using Trusted Language Extensions

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Create and run AWS DMS tasks using AWS Step Functions

The key role ‘visibility’ plays in healthcare’s cybersecurity resilience

Introducing Active Assist recommendations for BigQuery capacity planning

POPULAR CATEGORY