BUILD A PLATFORM TO FIND AND ANALYZE CONTENT ACROSS TRADITIONAL DATA SILOS TO DERIVE NEW VALUE-DRIVEN INSIGHTS.
Elsevier is a world-leading provider of information solutions that enhance the performance of science, health, and technology professionals, empowering them to make better decisions and deliver better care. They want to make analysis easier for everyone, enabling them to manage their work more efficiently and spend more time making breakthroughs.
Elsevier provides products and services which help researchers, governments, universities, and healthcare professionals to make discoveries, evaluate and improve their research strategies and provide insight for physicians to find the right clinical answers. Their goal is to expand the boundaries of knowledge for the benefit of humanity.
Elsevier publishes 430,000 peer-reviewed research articles annually.
Elsevier's major segment of customers are drug companies across the world, and Drug discovery is a complex process. The cost to develop one new drug is $2.6 Billion, and the approval rate for drugs entering clinical development is less than 12%. The attrition rate for drug candidates that is, the number of candidates you start with for each successful launch can be in the order of 10,000:1.
Scientists rely on knowledgebases related to pharmacology, medicine, chemistry, and biology as well as experimental data like clinical trials, experimental publications, tests performed on similar candidates etc. Some of these are purchased, while some are developed over a period within the company. Scientists spend an incredible amount of their expensive time searching through these knowledge bases. Take for example a simple question "What are the compounds that are similar in structure to benzene, have a boiling point of more than 40 Degrees F, and have no side effects on people with lymphoma". The question requires joining information from chemistry, medicine, and pharmacology. By "Joining," we mean understanding the question as if it's human and bringing information from different domains and joining them to provide a definitive answer.
The customer envisioned "A platform" that can join knowledge from different domains to make it searchable, and the search engine reacts as if its a human by understanding the question, parsing it into a machine-readable query and crawling through the databases, and bringing results along with the accuracy at which the answer is likely to answer customer's questions. That platform is ELSSIE; that's what Knoldus built it for Elseiver.
ELSSIE is a platform that connects information from multiple sources stored in the format of a knowledge graph and maintained by Elsevier's Subject Matter Experts (SMEs). ELSSIE enables users to find and analyze content across traditional data silos to derive new value-driven insights.
The ultimate goal of ELSSIE is to make complex information at the fingertips of the scientists so that they can carry on drug invention at a rapid pace.
For accomplishing this, the solution needs to be able to ingest multiple structured and unstructured content, store it as queriable structured data, semantically understand and generate relationships by recognizing entities and concepts, interpret stored data and offer graph query capabilities and provide an API to integrate with external applications and finally make it easy for scientists to search for information.
ELSSIE as a final solution included the following components:
In summary, ELSSIE project used Apache Spark, Apache Hadoop, Apache Cassandra, Apache Kafka, Apache Solr, Apache grid gain, All built on AWS. Several innovations like dynamically scaled Apache Spark and Hadoop clusters, extending QUERTZL using Antlr parsers, using LDA along with NLP to find entities in text and their contextual meaning instead of hard literary meaning are achieved.
The technology stack and architecture met the SLAs which were required for the platform. ELSSIE save a lots of manual effort and time while fetching the relevant data from the research papers.