Xenit's Breakthrough In Large Active Archives: A Report From Hyland Summit 2023

Introduction
Xenit’s Alfresco customers were presenting at 3 Hyland Summits in Europe in October. Following Oxford’s University Press testimony of their journey to AWS in London on the 10th of October, MNT discussed their upgrade to Alfresco 6.2 in Paris on the 19th of October 2023.

At the Hyland Summit 2023 in Dusseldorf, the spotlight shone bright on the Archive we built for Ethias, together with our partner NRB. Marc De Vriendt (NRB) and Wim Fabri from Xenit jointly showcased an intriguing use case for one of Xenit’s esteemed clients, Ethias. Titled “A Very Large Active Archive,” the presentation captured the essence of Xenit’s innovation, expertise, and commitment to delivering state-of-the-art Alfresco solutions.
Ethias: Belgium’s Preferred Insurance Brand
Ethias stands tall as the favorite insurance brand of Belgium. Their Alfresco journey began in 2016 with a Documentum archive that was quite heavy on the pocket. Initially migrating 60 million documents & emails, their security requirements were stringent, aligning with benchmarks such as OWASP 10. Fast forward to the present, the archive now boasts around 180 million documents. The exponential growth in document volume, now at about 2 million documents a month, up from 1 million in 2017, speaks volumes about the company’s expansion and success.

The Alfresco Archive: Technical Brilliance
When we talk about the technical aspects of the Ethias archive, it’s a symphony of robust systems and advanced configurations. Set up as a singular 3-node cluster, two nodes predominantly serve business requests, while the third caters to the Solr instances for indexing.
Diving deeper:
SOLR Sharding
Xenit leverages SOLR sharding to facilitate more intricate queries, bypassing the limitations of transactional database queries. With our focus strictly on metadata indexing and a minimal full-text index, SOLR sharding breaks down the index into manageable bits, each governed by a dedicated SOLR instance. This not only ensures the seamless flow of queries but also promises search response times under three seconds, even with an archive of this magnitude.
Infrastructure
On the infrastructure end, the system is supported by six servers running 18 SOLR instances in total. The underlying strength comes from Oracle Exadata handling the database and Datacore Swarm object storage taking care of the content store. The ingenious placement of documents in subfolders every minute counters potential performance hiccups linked to larger folders.

SOLR Sharding: The DBID Range Methodology
To manage the colossal archive size and maintain search efficiency, the division of the index into shards is paramount. Every document or folder node in Alfresco is assigned a sequence number or its ‘database id’. The shards then index a document based on its dbid range. With each shard covering a range of 25 million, the indexing process becomes systematic and seamless, eliminating the need for a complete reindex of all shards when new ones are introduced.
In Conclusion
Xenit’s showcase at the Düsseldorf Hyland Summit 2023 not only highlighted our superiority in managing large-scale archives but also our commitment to innovation, performance, and client success. As we continue our journey, we remain dedicated to pushing the boundaries of what’s possible, ensuring our clients like Ethias always stay ahead of the curve.
Download the full presentation here
Additional Resources