Contextual Intent and Entity Identification from Emails for Business Process Automation
Shipping and Logistics Company
Tensorflow, RASA, Java, Python, Spacy
Shipping and Logistics
A German international shipping and container transportation company transport containerized cargo on several major trade routes worldwide. It has a prominent market presence and brand recognition in Latin America, the Middle East, trans-Atlantic, and trans-Pacific trades.
The customer has a vast end clientele and have different departments serving various business needs. One of the pain areas was the requests received for data corrections. The users have to manually check the content of emails, identify the modifications needed and then update the details in the management system as requested.
The customer wanted to minimize the manual efforts involved, make the processes leaner, and automate some of their existing approaches using artificial intelligence and language processing.
The most critical challenge was with the type of data provided. The data was in the form of email files containing multiple emails inside a single email, attachments, and irrelevant information.
As the emails contained a large amount of information, finding a way to extract only the required entities and intents was challenging. It required implementing different custom approaches to the data extracted from mails.
For Proof of Concept, only a limited set of emails was provided, another common problem for training Machine Learning models.
Email data ingestion and extraction had their complexity.
“Technology is a great equalizer that enables our clients to compete with the largest banks in the world. One of the significant technology advantages that Knoldus expertise Solution provides is the ability to share across our product portfolio. The significant events that occur throughout an end user’s financial journey, from opening an account to initiating a home or small business loan to saving for college or retirement,” said Vice President, hosting architecture.
Our team started by analyzing the email data provided and the relevant information available which could be extracted from emails. The first task was to leverage the open source extract message library to accurately extract all the provided data in emails and convert it into a text file for further operations to perform effectively. Using this service/processor, our team extracted bulk data in a text file. We needed to extract only specific intents like change, amend, remove etc., which do not have any key associated with them. At the same time, only a few entities were present in the paragraph content, which was necessary to be extracted.
To solve this problem, an open-source library, namely Regex, was used to parse the data and identify and extract only the required information from the text file in the form of key-value pairs. This extracted data from all the emails were stored in a JSON file so that it could be accessed at later stages. Data extraction accuracy was above 95% from the emails provided.
We used RASA, an open source library used for conversational artificial intelligence, in the next step. We split the data provided into train, test and validation sets. We used data augmentation techniques to develop additional data for the machine learning models.
We trained, tested and validated our model and then did some parameter hyper tuning to get the desired result. The accuracy was not very good at the initial stage but improved with continuous iterations per the expectations set.
This model was then exposed as a service using Flask and further integrated with the data ingestion component to automate the end to end processes.
The organization has since expanded the use of Knoldus expertise Solution to other projects, including one where Kafka is being used to standardize and move data from Apache Cassandra databases to Molecula’s Cloud Data Access platform. “This solution uses multiple Knoldus’ expertise Solution features,” their team members explained. “We structure the data from our Cassandra databases using a model stored in Schema Registry, and we use Knoldus Replicator to replicate topics across multiple datacenters.”
“We move nearly 1.5 trillion dollars through our platform each year, so reliability is critical for us; we cannot have data loss or message-write failures.” As we continue to extend our platform into loan origination, loan decisioning, and other areas, the need to reliably share data becomes more critical. Having Knoldus’ expertise as part of our software architecture enables us to easily move data across products and across data centers, public and private, to fulfill that need.”