TNA evaluates Records365 Classification Intelligence

The AI for Digital Selection project focuses on duplicate detection, entity extraction, and classification.

Share on Social Media
Trusted by top companies

The National Archives

TNA are a non-ministerial department, and the official archive and publisher for the UK Government, and for England and Wales. They are the guardians of over 1,000 years of iconic national documents, from Shakespeare’s will to tweets from Downing Street, to preserve it for generations to come.

Location
Richmond, UK
Industry
Government
Size
500+

testimonial

Helping secure 1,000 years of history

The National Archives (TNA) is the official archive and publisher for the UK government, and for England and Wales, holding official records containing 1,000 years of history. Its role is to collect and secure the future of the government record, both digital and physical, to preserve it for generations to come, and to make it as accessible and available as possible.

TNA holds over 11 million historical and government records, houses approximately 550 staff and currently welcomes approximately 80,000 visitors per year.

A significant role of TNA is the accessioning into collection records from across government. All government departments are required to pass records to TNA for future preservation. Until recently, records were on paper, however digital and born-digital records are becoming a greater proportion of the record set, and will eventually all but replace paper.

TNA recognizes future challenges and that managing the classification and preservation of records will require the use of artificial intelligence (AI).

In order to help TNA to better understand solutions that can increase its capabilities in leveraging artificial intelligence tools to appraise and select data for permanent preservation, RecordPoint has been invited to be part of the AI for Digital Selection project with its Records365 service along with Classification Intelligence.

TNA holds over 11 million historical and government records, houses approximately 550 staff and currently welcomes approximately 80,000 visitors per year.

RecordPoint’s layer of intelligence appraises the value and risk of high volumes of information

Records365 is a cloud-based software-as-a-service platform that can connect to multiple content sources to enable organizations to apply federated governance across all their information, regardless of where it lives.

To help customers like TNA with the challenges faced as part of a digital transformation journey, RecordPoint is committed to bringing customers continuous innovation by delivering solutions that:

  • Centralize content from all sources, and make insights visible in easy user-friendly dashboards
  • Help intelligent, empowered organizations to realize lower costs and expenses through efficiency improvements
  • Are secure and compliant, enabling full regulatory compliance and data security.

As part of the project, TNA has provided RecordPoint with samples of labelled and unlabeled data that we have used to demonstrate the Records365 machine learning (ML) capabilities and increase TNA’s understanding of how to leverage AI using the following approach:

Load Retention Schedule: Using the retention schedules spreadsheet provided, we loaded each disposal class and retention schedule into the Records365 global File Plan.

Create Rules for Labelled Dataset: In order to automatically assign a disposal class and retention schedule in Records365 for the labelled data, a set of declarative rules were created in the Records365 rules tree that mapped each document to a specific disposal class using its metadata.

Import Labelled Dataset: Since the data was provided on a hard drive, for the scope of this project we have decided to load the labelled dataset from a Windows file share using the Records365 FileConnect connector. Once the connector was enabled and the documents were added to the file share, FileConnect looked for redundant, obsolete, and trivial (ROT) documents. The FileConnect ROTBot performed deduplication, enriched documents with additional metadata and automatically submitted them to Records365. Once processed by the Intelligence Engine, each document was classified according to rules previously created.

Train Model on Labelled Dataset: The Records365 Classification Intelligence capabilities were designed to be used by compliance and records management teams without requiring the involvement of a data scientist. The model was trained by simply selecting the different disposal classes on the FilePlan with enough data samples. The rest of the processing is automatically handled by Records365 without requiring user intervention.

Apply ML to Unlabeled Dataset: Once the model was trained, we submitted the unlabeled datasheet to Records365 using the same Windows file share and FileConnect Connector previously mentioned. Once again, as the content was added to the file share the FileConnect ROTBot performed deduplication and named entity extraction to enrich the context to the document to be used for e-discovery. Once received by Records365, the Intelligence Engine applied the machine learning model to each of the unlabeled documents to suggest a relevant category. After that, the Records Management team is still fully in control to make final decisions and can review the suggestions made by accepting or correcting them. This feedback loop is then used to improve the model over time.

... managing the classification and preservation of records will require the use of artificial intelligence.

Key observations and findings

As the outcome of the experiments undertaken during this project, the following key results and findings were determined:

  • Identified candidate records for permanent preservation
  • Detected duplicates for disposition
  • Overall training accuracy of 74.5%; test accuracy of 71.8%
  • Extracted entities: organizations, geopolitical entities, people
  • File analysis: content size summary, age summary

Future research and development

In addition to the intelligent capabilities available in Records365 today, RecordPoint is making big additional investments in the AI space. We understand that organizations still struggle to control their information and make meaningful business decisions due to the out-of-control number of content sources that they are dealing with on a day-to-day basis which contain structured, semi-structured, and unstructured content.

Some of the capabilities that customers can expect to see in Records365 in the future are:

  • Context enrichment
  • Multi-model appraisal
  • Unsupervised learning
  • Searchable knowledge graph
  • Multi-dimensional appraisal
  • Language models
  • AI-driven content analytics
  • Intelligent connectors
  • AI-based risk and value scoring

We believe that machine learning capabilities will be at the core of helping organizations to reduce their current risk and make better decisions faster. To do so, those capabilities need to be explainable and easy to use.

Discover Connectors

View our expanded range of available Connectors, including popular SaaS platforms, such as Salesforce, Workday, Zendesk, SAP, and many more.

Explore the Platform
This is a navy blue box with a pink geometric shape.
This is a navy blue box with a yellow and light blue geometric shape.

Assure your customers their data is safe with you

Protect your customers and your business with the Data Trust Platform.