6

Introduction to Couchbase for Oracle Developers and Experts: Part 6: Indexing

 2 years ago
source link: https://dzone.com/articles/introduction-to-couchbase-for-oracle-developers-am-5
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Overview

Oracle

Couchbase

Index Documentation

Index Documentation

Types of Indexes: 

Primary & secondary Index (B-tree)

Bitmap Index

Partial Index

Partitioned Index

Function-based index

Spatial index

Search indexes (full-text search)

Types of Indexes: 

Primary & secondary index (lock-free skiplist)

Partial index

Partitioned index

Functional-key index

Array index

Flex indexes

Search index

Spatial index

Indexing data structures

B-Tree

Bitmap

Spatial

Inverted tree

Indexing data structures

Lock-free skiplist

Inverted tree (text)

Z-curve (spatial)

CLASSES of INDEXES

Oracle indexes can be large and are managed through bufferpools.  Oracle text index can be creed with an in-memory option.

Couchbase has two classes of indexes: Standard secondary index which can be large and relevant entries are paged in based on usage, memory-optimized index, optimized for performance is entirely kept in memory and a write is done to the disk for recovery purposes.

INDEXING FEATURES

Updates;

Indexes are updated synchronously.  Changes are visible within the transaction (read your own writes).

Updates:

The indexes are updated asynchronously. However, within each transaction, changes are visible immediately (read your own writes) for all the access methods (including index scan).  Yes, we do magic!

Table Scans

For SQL indexes are optional. You need it for improving query latency, throughput, and meeting SLAs. Tables have internal mechanisms to scan the entire table, partitioned or otherwise.   

Collection Scans

Couchbase collections are hash partitioned distributed table/collection. You can retrieve a document if you have the document key. There isn’t a full scan access method just in the collection. You should build a PRIMARY INDEX that provides the equivalent of a table scan for collection. You can issue arbitrary queries on it.

Index consistency

Index updates are synchronous and are visible based on the isolation level. 

Index consistency:

Index updates are asynchronous and are visible based on index scan consistency. In Couchbase transactions, the index scan levels are request_plus by default and read-your-own-writes are always in place.  These subtle features are taken for granted in a single system and transactional RDBMS.  These options are available in modern distributed systems so applications can trade off consistency for availability and performance.

TYPES OF INDEXES

PRIMARY KEY index

CREATE TABLE t1(c1 int primary key)

CREATE TABLE t1(c1 int, constraint c1pk PRIMARY KEY(c1))

You can create a primary key one or more columns and an index is automatically created to enforce this primary key constraint.

PRIMARY KEY index

CREATE PRIMARY INDEX ON t1;

CREATE PRIMARY INDEX ip1 ON t1; 

Each JSON document you insert into Couchbase has a separate, user-generated, unique per-collection document key that can be up to 250 bytes. The primary key is simply an index on the document keys.  The uniqueness is enforced by the collection without the need for the primary key.

SECONDARY index

CREATE INDEX i1 ON t(c1)

CREATE INDEX i1 ON t(c1, c2, c3)

CREATE INDEX i1 ON t(c1 ASC, c2 DESC, c3 DESC)

These are the workhorses for an OLTP workload. These form the kernel of the index and can be combined with most other features to form sophisticated, sometimes complex indexes to power the workload. This is Luke’s lightsaber. 

SECONDARY index

CREATE INDEX i1 ON t(c1)

CREATE INDEX i1 ON t(c1, c2, c3)

CREATE INDEX i1 ON t(c1 ASC, c2 DESC, c3 DESC)

The secondary indexes are similar to Oracle at a high level. Couchbase indexes do not support reverse scans. If you do need them, you need to specify the DESC order in the index definition. You can define that for each key.  Each type of index will be used for all types of range scans, it only makes a difference in performance for order-by query optimization. 

PARTIAL index

Partial Index gives us the ability to create both local and global indexes on only a subset of partitions within a partitioned table. Prior to Oracle 12c, you could not create indexes on selective partitions; Indexes always meant on all of the data.” 

PARTIAL INDEX

The concept here is similar to PostgreSQL to Oracle.

You can create indexes on any arbitrary subset of documents and the optimizer will choose the index automatically and when appropriate.

CREATE INDEX i1 ON t(c1) where c2 = “USA”;

CREATE INDEX i2 ON t(c1, c2) where c3 IN [“C”, 23, 24];

CREATE INDEX i3 ON t(c1) WHERE c4 LIKE “xyz%”

FUNCTION based index

Instead of indexing the column value AS-IS, you index the result of a function or an expression on it.

CREATE INDEX i1 ON t(LOWER(c1));

CREATE INDEX i2 ON t(c1 + (c2 * c4));

FUNCTION based index

The functionality is similar.

CREATE INDEX i1 ON t(LOWER(c1));

CREATE INDEX i2 ON t(c1 + (c2 * c4));

PARTITION index

Oracle has the widest functionality and support for partitioning for table and therefore Index: range, list, hash, interval, reference, and all. All of these combined with other features makes it powerful. This is mainly targeted for data warehousing where you’re potentially analyzing large sets of data and “logically pruning” partitions to scan for the query has a significant benefit.

PARTITION index

Couchbase collections are always hash partitioned. By default, the index is global and in a single partition. You’ll have defined the partitioned index as part of CREATE INDEX. Couchbase supports hash partitioned index, but this can be on any arbitrary expression.  Just like any hash partitioned object, queries with equality or IN expression on the partitioning key will benefit from partition pruning; others will get the benefit from parallel scans.

For partitioning by range, you’ll have to use partial index syntax and create multiple indexes.  Oracle-like range or interval index syntaxes are unavailable. 

DOMAIN indexes (Search)

You can index and query text for language awareness, stemming, etc using Oracle text.  These indexes are used by the optimizer when you have CONTAINS, CATSEARCH, or MATCHES predicates. Oracle text supports only character types and cannot index numerical or DateTime data types. 

Full-Text Search (Search)

Text search is similar to Oracle. Couchbase FTS can index text(string), numbers, booleans, and datetimes making the search usable in a larger number of use cases.  Couchbase FTS has an elaborate query language and all of its features can be used by N1QL using the SEARCH() predicate.  N1QL also exploits the FTS index for its FLEX indexing optimization for the queries. Flex index is a technology where a single index in FTS can support arbitrarily complex predicates in N1QL.  This helps when users are given the flexibility to choose custom predicates for their reports. 

Use cases:

  1. Flexible query support: Index on fields (a, b, c) can have predicates (a= 10 and (b = 20 or c between 30 and 40)); It can also have predicates ((a = 10 or c = 20) or (b between 10 and 20); etc, etc.

  2. Dynamic query support: when you create this flex index with (dynamic property set to true), FTS will index all the fields in the document, including the ones unknown at the time of index creation. This takes flexibility to new heights. Index evolves as the schema evolves. 

  3. All the cases above support searching in addition to range queries.

  4. In addition to search, FTS allows simple aggregation known as facets.

  5. FTS also helps efficient processing of multiple array predicates because it can index any number of arrays in a single index efficiently

The full-text search has a fuller functionality to enable modern agile development and effective search.

DOMAIN indexes (Spatial)

These indexes non-tradition types like point, line, polygon, etc with the ability to issues filters like overlaps, contains, and nearest neighbor.  

Couchbase SPATIAL index

Couchbase spatial index uses a z-curve data structure and is integrated into FTS indexing and querying. This also enables you to index and query scalars, arrays, text, and spatial using a single index!  You can learn more here, here, and here 

ARRAY index

Oracle calls this MULTIVALUE index – essentially, the index will have multiple entries pointing to the same doing. A normal index will have one index entry per row.

CREATE MULTIVALUE INDEX mvi_1 ON mytable t

      (t.jcol.credit_score.numberOnly());

ARRAY index

The array is THE difference between the relational model and the JSON model.   — Gerald Sangudi

As we saw in the data model section, it’s easy to store the array, but difficult to index.  Couchbase supports the most generalized array indexing in databases. From simple array indexes on a field, set of fields, and expressions on each one of those. As mentioned above, Using FTS, Couchbase can index multiple array fields in a single index and push down multiple query predicates to the index scan to improve performance. 

BITMAP index

Unavailable. 

Index High Availability

Oracle allows the creation of multiple indexes with the same index definition. So, if one of the indexes is unavailable (rebuild/etc), the other can be used for queries.

Index High Availability

You can simply create additional copies of the index by specifying the num_replica parameter to CREATE INDEX statement. The query engine automatically distributes the workload between the indexes based on the performance and workload. 

Index consistency

Index updates are synchronous and are visible based on the isolation level. 

Index consistency:

Index updates are asynchronous and are visible based on index scan consistency. In Couchbase transactions, the index scan levels are request_plus by default and read-your-own-writes is always in place.  These subtle features are taken for granted in a single system and transactional RDBMS.  These options are available in modern distributed systems so applications can trade off consistency for availability and performance.

INDEX ADVISOR

Oracle’s SQL advisor includes an index advisor among other things.

INDEX ADVISOR

Couchbase has to ADVISE statement, ADVISOR function, and advisor service. 


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK