© Copyright IBM Corporaon 2023 Page 15 of 25
For more informaon about Analyzers, refer to the following blog: hps://opensearch.org/docs/latest/
Setting the Maximum Workers for the Indexing Queue Sweep
The indexing queue sweep is created automacally during Elascsearch indexing. The sweep is dened
with a default of eight workers. In some cases, you might need to increase the number of workers to
improve indexing throughput. It can be dicult to esmate if an increase in the indexing queue workers
impacts indexing throughput, or if the increase in the workers has a negave impact on other Content
Plaorm Engine operaons. If you are considering increasing the number of workers, do so incrementally
while you monitor the system by using the System Dashboard or with other tools that can monitor the
Content Plaorm Engine system resources (CPU, memory, and other resources). Refer to the secon on
Monitoring Performance for more informaon on tracking system performance.
Setting the Reindexing Sweep Job Parameters
The Reindexing Sweep Job can insert rows into the indexing queue very quickly, which might overload
the queue, and prevent newly added objects from being indexed in a mely fashion. Use the Inter-batch
delay to limit the processing rate of the Sweep Job. For example, assume the sweep uses the following
sengs:
• Workers: 1
• Batch size: 200
• Inter-batch delay: One second (1,000 milliseconds)
With these sengs the sweep rate, that is the rate that rows are inserted into indexing queue, cannot
exceed 200 per second. As a best pracce, set the Inter-batch delay to 1 second (1,000 milliseconds) or
higher, set workers to 1, and set the batch size to 200.
Tuning Content Based Retrieval Indexing with Elasticsearch for High Volume Work Loads
For high volume ingeson, migraon, or re-indexing scenarios, the default conguraon for the
Elascsearch Indexing Queue Sweep might not be sucient. A high ingest rate can cause the number of
indexing requests to build up. In turn, this can cause a signicant delay in documents being included in
search results. Making the following changes can help improve indexing throughput rate:
• Increasing the number of sweep workers
This is the recommended method of tuning indexing performance. To change the maximum number
of sweep workers, go to Object Store > Sweep Management > Queue Sweeps > Elasticsearch
Indexing Queue Sweep > Properties > Maximum Sweep Workers.
• Changing the bulk batch size
The indexing rate can be affected by tuning the BulkAPI batch size. The BulkAPI batch size controls
the size of batches that Content Platform Engine submits for indexing to the Elasticsearch cluster.
The default BulkAPI batch size is 40 documents. Alter the setting using the following JVM argument:
ecm.elasticsearch.bulkapi.max.batch.size
Backup and Recovery
Recommendaons for backup and recovery: