© Copyright IBM Corporaon 2023 Page 1 of 25
Using Elascsearch and OpenSearch for Content Indexing and Content-
Based Retrievals (CBR)
The use of Elascsearch and OpenSearch was rst introduced as a preview capability in the 5.5.8 release.
In the 5.5.12 release of FileNet Content Manager and IBM Content Foundaon, Elascsearch or
OpenSearch can be used as a fully supported alternave to IBM Content Search Services (CSS) for
content indexing and for content-based retrieval (CBR). This new capability can be used with tradional
installaons of Content Plaorm Engine and with containerized installaons.
Elascsearch or OpenSearch licenses must be purchased separately, they are not included in the FileNet
Content Manager or IBM Content Foundaon bundles. Refer to the appropriate soware product
compability report (SPCR) for informaon on the supported levels of Elascsearch and OpenSearch.
Reports can be generated here: hp://www.ibm.com/soware/reports/compability/clarity/index.html
Some components within Cloud Pak for Business Automaon (CP4BA) embed Elascsearch and
OpenSearch. However, these embedded versions are not licensed for use with FileNet Content Manager
or IBM Content Foundaon. In addion, when installing the content paern with the CP4BA operators,
only CSS is available for content indexing and for content-based retrieval.
This new content-based search feature operates similarly to CSS. The Document, Annotaon, Folder, and
Custom Object classes, as well as the string properes of those classes can be enabled for full text
indexing.
The CONTAINS clause in a full text search query is very similar for retrieving content using CSS,
Elascsearch, or OpenSearch; therefore, exisng queries do not need to change if a decision is made to
index content with Elascsearch or OpenSearch rather than CSS.
There are dierences in the results that might be returned by the dierent technologies as the stemming
algorithms are dierent.
This white paper provides insight into the new content-based search feature and idenes the
dierences between the technologies used for content-based indexing and searching.
Important: In this white paper, as well as in the Administraon Console for Content Engine (ACCE), the
term Elascsearch is used to refer to both Elascsearch and OpenSearch.
© Copyright IBM Corporaon 2023 Page 2 of 25
Contents
Using Elascsearch and OpenSearch for Content Indexing and Content-Based Retrievals (CBR) ................ 1
Search Engine to Object Store Mapping.................................................................................................. 4
Elascsearch Index Areas ........................................................................................................................ 5
Indexing Pipeline ..................................................................................................................................... 6
Conguring an Environment to use Elascsearch ................................................................................... 7
Supported Elascsearch and OpenSearch Releases ............................................................................ 7
Congure an Elascsearch Cluster to Allow Content Plaorm Engine Access ..................................... 7
Required Permissions .......................................................................................................................... 7
Solid-State Storage Device .................................................................................................................. 7
Congure Content Plaorm Engine ..................................................................................................... 7
Addional Step for Windows Environments ..................................................................................... 10
Reindexing ............................................................................................................................................ 11
Full Text Query Syntax ........................................................................................................................... 12
Word Stemming .................................................................................................................................... 13
Best Pracces ........................................................................................................................................ 14
Selecng the Number of Shards for an Index Area ........................................................................... 14
Selecng the Language Analyzers for an Object Store ...................................................................... 14
Seng the Maximum Workers for the Indexing Queue Sweep ........................................................ 15
Seng the Reindexing Sweep Job Parameters .................................................................................. 15
Tuning Content Based Retrieval Indexing with Elascsearch for High Volume Work Loads .............. 15
Backup and Recovery ........................................................................................................................ 15
Monitoring Performance....................................................................................................................... 17
PCH and Log File Counters ................................................................................................................ 17
Using System Dashboard to Monitor Elascsearch ........................................................................... 18
Indexing Logging ............................................................................................................................... 20
Search Logging .................................................................................................................................. 22
Tuning ............................................................................................................................................... 23
Known Issues ........................................................................................................................................ 24
Read Time-out................................................................................................................................... 24
Index Stascs .................................................................................................................................. 24
Content Truncaon ........................................................................................................................... 25
© Copyright IBM Corporaon 2023 Page 3 of 25
© Copyright IBM Corporaon 2023 Page 4 of 25
Search Engine to Object Store Mapping
The search engine to object store mapping is as follows:
The Content Platform Engine domain supports both the new Elasticsearch content-based search
feature and CSS.
A domain can support only one Elasticsearch content-based search cluster.
An object store can support either Elasticsearch content-based search or CSS, but not both.
Figure 1 illustrates this mapping.
Figure 1: Search Engine to Object Store Mapping
© Copyright IBM Corporaon 2023 Page 5 of 25
Elasticsearch Index Areas
An Object Store can support one Elascsearch index area. The index area contains one Elascsearch
index per enabled root class. For instance, one index for the Document class and all the subclasses of the
Document class, and one index for the Custom Object class and all the subclasses of the Custom Object
class. In comparison, CSS supports mulple index areas per object store, and mulple paroned
indexes per index area. The Elascsearch index scale-out is based on index sharding, which is controlled
by Elascsearch.
Indexing is usually quicker when you have more shards, as each document needs to be stored only once
per shard. If fast ingeson is the major concern, there should be at least one shard per data node.
For more informaon on shards and nodes, refer to this documentaon:
hps://www.elasc.co/guide/en/elascsearch/reference/current/size-your-shards.html
© Copyright IBM Corporaon 2023 Page 6 of 25
Indexing Pipeline
The indexing pipeline used for Elascsearch uses a queue sweep for indexing called the Elascsearch
Indexing Queue Sweep.
The Content Plaorm Engine object server code creates index queue requests for all newly created,
updated, or deleted objects that are Content Based Retrieval (CBR) enabled. An Index Job creates queue
entries when a class is rst enabled for indexing, or when a re-index request is made. The index entries
are processed by the indexing queue sweep using the Elascsearch REST API.
Figure 2 illustrates the indexing pipeline for Elascsearch and for CSS.
Figure 2: The indexing pipeline for Elasticsearch and for CSS
© Copyright IBM Corporaon 2023 Page 7 of 25
Configuring an Environment to use Elasticsearch
Supported Elasticsearch and OpenSearch Releases
Refer to the appropriate soware product compability report for supported Elascsearch and
OpenSearch versions and for any addional caveats. Use the following website to generate a report:
hp://www.ibm.com/soware/reports/compability/clarity/index.html
Configure an Elasticsearch Cluster to Allow Content Platform Engine Access
You must congure the Elascsearch cluster to allow access by the Content Plaorm Engine. The Content
Plaorm Engine supports both IP-based security and username/password-based security.
For IP-based security, access to the Elascsearch cluster is controlled by the client IP addresses that can
access the cluster (normally within a VPN). As a result, the Content Plaorm Engine Elascsearch cluster
object is created without specifying a username or a password.
For username/password-based security, create credenals on the Elascsearch cluster that allow access
by the Content Plaorm Engine. Set the specied credenals as the username/password properes on
the Content Plaorm Engine Elascsearch cluster object.
Required Permissions
The Elascsearch users and roles congured for the Content Plaorm Engine Elascsearch cluster must
have the following access rights:
Read access to the cluster base URL to test connectivity to the cluster.
Full control access to indexes with the prefix name of ‘fncm_*.
Solid-State Storage Device
Document indexing is a disk-intensive acvity. In producon environments, use a solid-state storage
device (SSD) for the cluster storage to limit the possibility of disk I/O being a performance boleneck.
Configure Content Platform Engine
Use ACCE to congure the Content Plaorm Engine to use Elascsearch:
1. Create an Elasticsearch cluster at the domain level. Select Domain > Global Configuration >
Administration > Elasticsearch Clusters.
There can be only one Elasticsearch cluster per domain. However, the cluster can be shared by
multiple object stores.
Supply either the name of the Elasticsearch load balancer or a list of host names.
If you provide a list of host names, the Content Platform Engine manages requests across the set of
hosts in a round-robin basis.
You must also provide a port for accessing the cluster. The CPE does not supply a default port.
Append the port number to each host name; for instance, MyElasticSearchCluster:9200 or
MyOpenSearchCluster:443.
© Copyright IBM Corporaon 2023 Page 8 of 25
2. Create an OpenSearch index area at the object store level. Expand the appropriate object store
node, then select Administrative > Index Areas.
It is important to consider the number of shards and replicas per index when you create the index
area. For performance reasons, consider using the same number of shards and nodes as a starting
configuration. Once content is indexed, changing the number of shards or replicas requires content
to be re-indexed. Refer to the following section in this document for additional best practice
information: Best Practices.
3. Enable Elasticsearch at the object store level. Select the appropriate object store node, then select
the Text Search tab and check the Enable full text indexing and search option.
On the Elasticsearch tab, select the indexing analyzers you want to use. There are two nonlanguage
analyzers: simple and fncm_email_analyzer.
Analyzers are applied when objects are first indexed, and if the analyzer list is changed, the objects
must be reindexed. More information on selecting analyzers is provided in the Best Practices section
of this document.
4. Enable CBR on classes and properties. Once a class is enabled for CBR, newly created objects of the
class are automatically indexed. If there are existing objects for classes that are CBR enabled, you
can index those objects by selecting one of the Index Class for Content Search options.
© Copyright IBM Corporaon 2023 Page 9 of 25
The option to enable CBR on a class is on the General tab of the class definition.
Use the Enable CBR checkbox on a string property to index the content of string properties. Navigate
to the property definition in the appropriate classes, then check the Enable CBR option. The Enable
CBR checkbox is on the General tab of the property definition.
5. Monitor the Elasticsearch indexing queue sweep process using ACCE.
© Copyright IBM Corporaon 2023 Page 10 of 25
Additional Step for Windows Environments
On CPE Windows servers, install the Microso 2013 and 2019 Redistributable les if the les are not
already on the server. The les are needed for CBR indexing and searching, as well as for generang
thumbnails.
© Copyright IBM Corporaon 2023 Page 11 of 25
Reindexing
The Elascsearch feature does not support the Index Resync capability that is available for CSS. The Index
Resync mechanism, is the recommended best pracce for resynchronizing indexes as a ne-grained
lter expression can be used to reindex a small set of objects. For more informaon, refer to described
in hps://www.ibm.com/docs/en/lenet-p8-plaorm/5.5.12?topic=failures-xing-out-sync-full-text-
index.
The two opons for Elascsearch reindexing are:
Reindex all the objects in an entire class (Document, Annotation, Custom Object, or Folder)
Use the Full Text Reindex Job Sweep to reindex specific objects
The Full Text Reindex Job Sweep provides a mechanism to submit CBR-enabled Documents, Annotaons,
Custom Objects, or Folders for reindexing, based on a lter expression. (See Sweep lter condions
hps://www.ibm.com/docs/en/lenet-p8-plaorm/5.5.12?topic=sweeps-sweep-lter-condions). The
primary use case for the Full Text Reindex Job Sweep is for reindexing CBR-enabled objects that are
indexed under Elascsearch.
A Full Text Reindex Job Sweep operates on objects selected by a lter expression and a CBR-enabled
class (Document, Annotaon, Custom Object, or Folder). The sweep can act on a single class, or the
sweep can also process the objects in the subclasses. Iniate the sweep from ACCE. Navigate to the
appropriate object store and then to Sweep Management > Job Sweeps > Full Text Reindex Jobs.
The Job Sweep processes objects that sasfy the lter expression and class condions, and then inserts
the index requests into the indexing queue to be processed by the indexer. Monitor the progress and
results of a Full Text Reindex Job Sweep using ACCE.
The default sengs for the Full Text Reindex Job Sweep are
Inter-Batch Delay: 1,000 milliseconds
Sweep Batch Size: 200
Target class: Document
Effective start and end dates: Blank
The Sweep Job inserts rows into the indexing queue very quickly, which might overload the queue, and
prevent newly added objects from being indexed in a mely fashion. Use the Inter-batch delay to limit
the processing rate of the Sweep Job.
© Copyright IBM Corporaon 2023 Page 12 of 25
Full Text Query Syntax
Refer to the following IBM Documentaon topic for details on how the CONTAINS clause is used by CSS:
hps://www.ibm.com/docs/en/lenet-p8-plaorm/5.5.12?topic=reference-cbr-queries
The syntax of the CONTAINS clause for Elascsearch is very similar. There is one advanced feature that
applies to Elascsearch, which is the oponal ability to supply the analyzer name to be used for the
search. Normally all analyzers are used. For example, to search the document tles for the term 'arches'
using only the English analyzer, supply the analyzer in the CONTAINS clause as follows:
CONTAINS(d.*,'documenttitle.english:arches')
The query syntax used within the CONTAINS clause is Elascsearch specic, with some minor dierences
from the CSS query syntax. The Content Plaorm Engine passes the value supplied in the CONTAINS
clause to Elascsearch as the query string value. For more informaon, refer to the following
documentaon hps://opensearch.org/docs/latest/query-dsl/full-text/index/.
© Copyright IBM Corporaon 2023 Page 13 of 25
Word Stemming
Stemming is the process of reducing words to their base or root form. CSS and Elascsearch use dierent
methodologies to derive the stems of words. The word stemming determines what documents are found
with a CBR search.
CSS uses diconary stemming, while Elascsearch uses algorithmic stemming. The two stemming
methods can produce dierent results. For instance, non-diconary words are stemmed by Elascsearch,
but not stemmed by CSS.
By default, Elascsearch uses Porter algorithmic stemming. For more informaon, refer to the following
link: hps://snowballstem.org/algorithms/porter/stemmer.html
© Copyright IBM Corporaon 2023 Page 14 of 25
Best Practices
This secon covers the following topics:
Selecting the appropriate number of shards for an index
Selecting the appropriate language analyzers
Setting the maximum number of indexing queue sweep workers
Setting the reindexing sweep job parameters
Tuning CBR indexing when there is a high work load
Backup and recovery
Selecting the Number of Shards for an Index Area
The Content Plaorm Engine takes advantage of Elascsearch index sharding to achieve index scale-out.
It is important to use enough shards when you create the index area, as the number of shards cannot be
changed without performing a full reindex. The OpenSearch administrator needs to determine the
correct number of shards for the index area based on predicted ingeson volume, number of shards the
cluster can manage, and so on. The number of replicas needs to be considered when you create the
index area, since the replicas provide high availability for the indexed data.
Indexing is usually quicker when you have more shards, as each document needs to be stored only once
per shard. If fast ingeson is the major concern, you should have at least one shard per data node.
The rao of shards to nodes is very important. Indexing work is sent to the shards in a round robin
fashion. To ensure the best performance, the number of shards should be a mulple of the number of
nodes. For instance, if there are three data nodes, congure three, six, or nine shards. If the number of
shards is not a mulple of the number of nodes, performance degrades because the workload is not
spread out evenly over the nodes.
Selecting the Language Analyzers for an Object Store
Elascsearch uses analyzers for indexing and for searching. The analyzers used by the Content Plaorm
Engine are set at the object store level and are applied to all CBR-enabled classes in the object store.
If the analyzer list changes a reindex is required. The recommendaon is to use the simple analyzer and
one language analyzer for each of the languages in which documents ingested into the object store
might be wrien.
The simple analyzer breaks tokens on punctuaon. Without the simple analyzer, sentences that lack
spaces between the punctuaon are not tokenized as expected.
However, using the simple analyzer can cause problems with searches not nding strings with numbers.
For example, ‘PO3025721’ is tokenized as just ‘po’. This can cause the search results to match far more
documents than expected.
The fncm_email_analyzer is a custom analyzer designed to handle informaon in emails.
© Copyright IBM Corporaon 2023 Page 15 of 25
For more informaon about Analyzers, refer to the following blog: hps://opensearch.org/docs/latest/
Setting the Maximum Workers for the Indexing Queue Sweep
The indexing queue sweep is created automacally during Elascsearch indexing. The sweep is dened
with a default of eight workers. In some cases, you might need to increase the number of workers to
improve indexing throughput. It can be dicult to esmate if an increase in the indexing queue workers
impacts indexing throughput, or if the increase in the workers has a negave impact on other Content
Plaorm Engine operaons. If you are considering increasing the number of workers, do so incrementally
while you monitor the system by using the System Dashboard or with other tools that can monitor the
Content Plaorm Engine system resources (CPU, memory, and other resources). Refer to the secon on
Monitoring Performance for more informaon on tracking system performance.
Setting the Reindexing Sweep Job Parameters
The Reindexing Sweep Job can insert rows into the indexing queue very quickly, which might overload
the queue, and prevent newly added objects from being indexed in a mely fashion. Use the Inter-batch
delay to limit the processing rate of the Sweep Job. For example, assume the sweep uses the following
sengs:
Workers: 1
Batch size: 200
Inter-batch delay: One second (1,000 milliseconds)
With these sengs the sweep rate, that is the rate that rows are inserted into indexing queue, cannot
exceed 200 per second. As a best pracce, set the Inter-batch delay to 1 second (1,000 milliseconds) or
higher, set workers to 1, and set the batch size to 200.
Tuning Content Based Retrieval Indexing with Elasticsearch for High Volume Work Loads
For high volume ingeson, migraon, or re-indexing scenarios, the default conguraon for the
Elascsearch Indexing Queue Sweep might not be sucient. A high ingest rate can cause the number of
indexing requests to build up. In turn, this can cause a signicant delay in documents being included in
search results. Making the following changes can help improve indexing throughput rate:
Increasing the number of sweep workers
This is the recommended method of tuning indexing performance. To change the maximum number
of sweep workers, go to Object Store > Sweep Management > Queue Sweeps > Elasticsearch
Indexing Queue Sweep > Properties > Maximum Sweep Workers.
Changing the bulk batch size
The indexing rate can be affected by tuning the BulkAPI batch size. The BulkAPI batch size controls
the size of batches that Content Platform Engine submits for indexing to the Elasticsearch cluster.
The default BulkAPI batch size is 40 documents. Alter the setting using the following JVM argument:
ecm.elasticsearch.bulkapi.max.batch.size
Backup and Recovery
Recommendaons for backup and recovery:
© Copyright IBM Corporaon 2023 Page 16 of 25
Use shard replicas for high availability. The shard replicas enable Elasticsearch to automatically
recover corrupted shards.
Take periodic Elasticsearch index snapshots in case a manual recovery is needed.
If an index needs to be recovered, use a reindex sweep job to synchronize the index with the object
store data. For example, if an index is recovered to the current time minus four hours, run a reindex
sweep job using a filter expression that processes objects with a last modified date of the current
time minus four hours.
© Copyright IBM Corporaon 2023 Page 17 of 25
Monitoring Performance
This secon provides details of the informaon provided in the
PCH and log file counters
Trace logs when CBR tracing is enabled. There are different entries when indexing content and when
searching for content.
PCH and Log File Counters
The CBR layer provides a rich set of high level PCH counters, many apply to both CSS and Elascsearch. In
addion, there is the following set of Elascsearch-specic PCH counters:
Counter
Descripon
IndexReadSource
Time spent reading source content to perform indexing
IndexJSONPrep
Time spent preparing extracted text (as JSON) for indexing
Query
The number of mes Elascsearch queries are run
QueryHits
The number of results from the Elascsearch queries
ScalarQuery
Number of queries based on a set of object IDs
ScalarQueryHits
The number of results found by the scalar queries
CountQuery
A count of the hits found by the queries (using a Select COUNT query)
CountQueryHits
The number of mes a COUNT query is run
GetObjectById
The number of queries that search for a single object
SubmitBulkBatch
The number of mes a call is made in a batch to create, update, or
delete index entries
SubmitBulkBatchItems
Total number of items processed by a bulk batch submission
SubmitBulkBatchFailures
The number of mes the batch submission fails
BulkObjectsCreated
Within the batch, the number of index entries created
BulkObjectsUpdated
Within the batch, the number of index entries updated
BulkObjectsDeleted
Within the batch, the number of index entries deleted
BulkObjectsIgnored
Within the batch, the number of index entries that don’t need to be
updated
BulkObjectsFailed
Within the batch, the number of index requests that failed
The counters are periodically wrien to the log le if CBR Summary trace is enabled and there is indexing
or search acvity. Look for ElascCounters in the logs. Following is a sample entry:
2022-02-01T18:13:43.062 0000007A CBR FNRCE0000D - DEBUG
ElasticCounters.objects_indexed: 742, indexed_data_size: 23861273,
objects_deleted: 0, batches_submitted: 27, batch_items_submitted:
742, queries: 0, query_hits: 0, scalar_queries: 0,
scalar_query_hits: 0, count_queries: 0, count_query_hits: 0,
get_object_by_id: 742, initiate_pit: 0
© Copyright IBM Corporaon 2023 Page 18 of 25
Using System Dashboard to Monitor Elasticsearch
System Dashboard displays the CBR counters for Elascsearch under the Server-Based Counters. For CSS
counters are displayed under the FileNet Content Plaorm Engine node under the object store
In Figure 3, the Bulk API Objects Created counter can be used to check on the CBR indexing rate
© Copyright IBM Corporaon 2023 Page 19 of 25
Figure 3: Viewing indexing rate
Figure 4 shows the number of items processed by a sweep process.
© Copyright IBM Corporaon 2023 Page 20 of 25
Figure 4: Objects processed by a sweep process
Indexing Logging
This secon explains the log entries that occur when the dierent logging levels are enabled for CBR.
CBR summary trace level
Following is an example of the trace entry produced by the Elascsearch indexing queue sweep
processing (ElascSearchIndexer.java) aer each batch is processed. Note that the queue sweep uses
mulple workers, which means that the output from a single entry does not indicate overall throughput.
2022-02-23T11:42:44.047 00000191 CBR FNRCE0000D - DEBUG
Elasticsearch index batch: 200 items in 40876 ms,
create_update_in_batch 200, delete_in_batch 0, not_in_batch 0
(initialize 219 ms, extract_text 1469 ms, extract_text_cleanup 156
ms, index 38954 ms, update_index_counters 0 ms,
propagate_failure_codes 78 ms, content_bytes 2564998,
largest_content
406443)>>200,40876,200,0,0,219,1469,156,38954,0,78,2564998,406443
The elements in the entry are described in the following table. The integers aer the double broken
brackets (>>) marker are the values from the summary trace line in an easy to parse format. All duraon
values are in milliseconds.
© Copyright IBM Corporaon 2023 Page 21 of 25
Component
Meaning
index batch
The number of items in the batch. As each item is prescreened before being
inserted into the indexing queue, each item has some form of processing applied to
it, even if the processing simply determines that the text cannot be extracted.
items in ...
The overall batch processing duration.
create_update_in_
batch
The number of create and update requests contained in the batch sent to
Elasticsearch.
delete_in_batch
The number of delete requests contained in the batch sent to Elasticsearch.
not_in_batch
The number of queue sweep entries not included in the batch. For instance, if a
delete references a non-existent Elasticsearch object, this is indicated in the
not_in_batch value.
initialize
Overhead for setting up the batch for indexing.
extract text
Time spent extracting text using Oracle Outside In Technology (OIT).
extract text
cleanup
Time spent deleting temporary files, and so on.
index
Time spent indexing the batch. This time is used to make calls to Elasticsearch.
update index
counters
Time spent updating the Index Area API object with the current Elasticsearch index
counters: number of documents and size of indexes.
propagate failure
codes
Time spent updating the last failure code on objects. This action occurs only if the
indexing failure recording level on the object store is set to “propagate to source.
If an error occurs, the failure code is written to the index_failure_code property on
the document that has the issues. If propagate to source is not set, the error is
written to the CE log.
content_bytes
Total amount of content indexed in the batch.
largest_content
Within the batch, the largest document.
CBR moderate trace level
Following is an example of the details of each bulk API request submied to Elascsearch. The entry is
produced by the Elascsearch layer (ElascBulkAPIBatch.java) that handles subming the bulk API
requests. For each indexing queue sweep batch there are normally mulple bulk API requests. The
requests break the larger queue sweep batch into smaller bulk API batches.
© Copyright IBM Corporaon 2023 Page 22 of 25
2022-02-01T18:23:23.934 000006E4 CBR FNRCE0000D - DEBUG
ElasticBulkAPIBatch.Summary elapsed: 211 ms (took 115), 40 created,
0 updated, 0 deleted, 0 ignored, errors: false, 0 failed
The elements in the entry are described in the following table. All duraon values are in milliseconds.
Component
Meaning
elapsed
Elapsed me measured from the Content Plaorm Engine side of the request.
took
The elapsed me reported by Elascsearch.
created
The number of new index items created.
updated
The number of exisng index items updated.
deleted
The number of exisng index items deleted.
ignored
The number of items that are not submied due to some condion; for example, an update
of a non-existent index item.
errors
True if any errors are reported by Elascsearch for any item in the batch; otherwise, false.
failed
The number of items in the batch that failed.
CBR Detail trace level
Following is an example of a detail trace entry. The entry shows the number of documents in each batch
sent to Elascsearch. By default, the batch size is 40. The maxRequestPayloadSize is the maximum size of
the batch that can be sent to Elascsearch in bytes.
2022-02-23T11:20:43.858 00000181 CBR FNRCE0000D - DEBUG
ElasticBulkAPIBatch.construction, bulkAPIenabled=true,
maxRequestsPerBatch=40, maxRequestPayloadSize=25165824
Search Logging
ElascAPI full-text query trace
Following is an example of a moderate search trace entry for an ElascAPI.full-text query:
2021-06-22T21:43:07.905 000000BC CBR FNRCE0000D - DEBUG
ElasticAPI.full-text query returned 2000 hits in 402 ms (parsing 20
ms) [took=371, timed_out=false] query=[{"track_scores": true,
"_source": "object_id", "size" : 2000, "query": { "query_string" :
{ "fields" :
["fncm_content","fncm_content.english","fncm_content.french"],
"default_operator": "and", "analyze_wildcard" : true, "query" :
"kirk OR storage"}}, "search_after": [4.2998805,"7A2F1160-000E-CFC5-
A480-9DBCE048DB5F"], "sort": [{"_score": "desc", "object_id":
"asc"}]}]
© Copyright IBM Corporaon 2023 Page 23 of 25
Component
Meaning
returned x hits in y ms
Provides the number of hits returned by the search and how long the
search took in milliseconds.
parsing
The me spent parsing the JSON results returned from Elascsearch; that is,
the me to convert the JSON to results consumable by the Content Plaorm
Engine.
took
The me Elascsearch spent processing the query. Elascsearch supplies
this value as part of the search response.
The dierence between the complete duraon minus the took me plus the
parsing me, is the amount of me the search was in transit between the
Content Plaorm Engine and Elascsearch; that is, the send/receive me.
Timed_out
False if the search completed in a mely manner.
CSEElascQuery summary
If there is search acvity, the following kinds of trace lines are produced once every two minutes. There
is one line for each of the three types of searches:
Normal
Scalar
Count (estimation).
Each line shows the total hits and duraon for the given search type. For normal queries, the number of
hits consumed by Content Plaorm Engine is also shown (get next result count).
2020-10-21T16:40:45.352 00000142 CBR FNRCE0000D - DEBUG
CSEElasticQuery.summary: 7 queries returned 5567 hits in 5602 ms,
get next result count 4377
2020-10-21T16:40:45.352 00000142 CBR FNRCE0000D - DEBUG
CSEElasticQuery.summary: 0 scalar queries returned 0 hits in 0 ms
2020-10-21T16:40:45.352 00000142 CBR FNRCE0000D - DEBUG
CSEElasticQuery.summary: 0 estimation queries returned 0 hits in 0
ms
Tuning
Elascsearch tuning is described in the following arcle. There are no specic tuning recommendaons
for the Content Plaorm Engine.
hps://www.elasc.co/guide/en/elascsearch/reference/master/tune-for-search-speed.html
© Copyright IBM Corporaon 2023 Page 24 of 25
Known Issues
Read Time-out
In some cases, Elascsearch-related read me out error messages might appear in the Content Plaorm
Engine logs. The errors occur during indexing, when it takes longer than the default read meout value
of 90 seconds to complete an operaon on the Elascsearch side. The read meout value can be
changed by seng the JVM on all Content Plaorm Engine servers. The following example, changes the
meout to 120 seconds:
-Decm.elasticsearch.read.timeout.ms=120000
Index Statistics
You can retrieve the count of objects in an Elascsearch index in ACCE by selecng the Properes Tab
under Object Store > Administrave > Index Area > Index. Edit the Elascsearch Indexes property and
select an index from the list
Option 1 for documents
Option 2 for annotations
Option 3 for custom objects
Option 4 for folders
You can now nd the count of objects under the Indexed Object Count property on the Index tab.
© Copyright IBM Corporaon 2023 Page 25 of 25
Alternavely, use the OpenSearch REST API to retrieve the count of objects in an Elascsearch index. For
example, query the /_cat/indices endpoint.
The indexing queue sweep counters (such as objects examined, and objects processed) cannot be used
to determine the exact number of objects that are indexed. This is because of how a queue sweep
handles batches, failures, retries, ignored documents, and so on.
Content Truncation
The Content Plaorm Engine limits the size of an indexing request sent to Elascsearch by truncang the
extracted content if needed. This is done to avoid exceeding the Elascsearch maximum payload size (set
with the hp.max_content_length parameter). The default maximum size for Content Plaorm Engine
properes and content is 77,000,000 characters. The maximum size can be altered using a JVM
parameter. For example, the following statement limits to the size to 5,000,000:
-Decm.elasticsearch.content.payload.max.chars=5000000