Reactive Product Development
Partner : Elsevier
Website : https://www.elsevier.com/
Technologies Used : Scala, Apache-Spark, Quetzal, Spark-ML, GridGain(Apache Ignite), Postgresql, Akka, Akka-http, Cassandra, SOLR, Zeppelin
Domain : Semantic Web
Elsevier is an information and analytics company and one of the world's major providers of scientific, technical, and medical information. Elsevier provides products and services which include electronic and print versions of journals, textbooks, and reference works, and cover the health, life, physical and social sciences. In the primary research market during 2016, researchers submitted over 1.5 million research papers to Elsevier-based publications. Over 20,000 editors managed the peer review and selection of these papers, resulting in the publication of more than 420,000 articles in over 2,500 journals.
Elsevier's customers operate in an increasingly challenging market. If you look at their biggest customer segment, the pharmaceutical industry, it is now taking upwards of 12 years and $2 Billion to bring a single new drug to market and there is still no guarantee that the payers will allow the physicians to prescribe it even if you do. The attrition rate for drug candidates, that is the number of candidates you start with for each successful launch can be of the order of 10,000:1. The "old thinking" was to stack the odds by just churning out more candidates but that equation is broken by longer, development times, greater regulatory pressure and greater costs overall. The solution has to be to make fewer, better "bets" and to do that you need information, data that you can use predictively to help you make the right decisions about which drug programs to progress and which to kill. Customers are more than aware that a lot of this data does exist. It sits in MS Office documents on shared drives, in SharePoint or Documentum document management systems, in assay databases and clinical trial results. But this data was created to serve the needs of siloed groups who used to "chuck it over the wall" to the next department in the drug development process - and no system can "talk" to another. Customers read the hype in the press about AI, Big Data and Cognitive Computing. They can see the value of these technologies but they know they are so very far away from the dream. Many IT Directors in Pharma will tell you they can't even find the data, let alone compute on it - and when scientists can't find the data they just redo the whole thing, wasting money on duplicate work.
The platform can join up the silos that are frustrating their customers and make that data findable and computable. It can connect other data too: the information resources put up by organisations like the National Institutes of Health and the European Molecular Biology Organisation. It can do the "heavy lifting" of connecting and enriching data across silos so the Data Scientists in customer organisations, and in their own Professional Services organisation can focus their attention on answering the real questions that they need to answer. It can even provide a resource for third parties to build applications that operate on the connected data, adding more value to the investment in the platform. That platform is ELSSIE, thats' what need to be built.
Knoldus started working on two key parts of ELSSIE -
- ML Workbench and
- SPARQL engine (Quetzal)
ELSSIE is a platform that connects information from multiple sources in knowledge graphs maintained by Elsevier's Subject Matter Experts(SMEs) that enables users to find and analyze content across traditional data silos to derive new insights. ML Workbench is an application which has the ability to ingest and enrich content so that the "heavy lifting" data preparation step is taken care of, freeing scarce data scientist resources to focus on their analyses. It was built using Spark-ML/MLLib & Scala's Functional Programming, to attain massive scalability and speed for running ML algorithms. Whereas SPARQL engine (Quetzal) was a CLI which allowed querying Knowledge Graphs that leverage Elsevier's domain knowledge and ontologies to connect data across the entire spectrum of drug discovery & development. Inside it we used Spark SQL, Scala's functional programming, futures and wherever needed Akka extensively to Extract, Transform, & Load millions of records/sec on to GridGain (Apache Ignite) for running SPARQL queries.
The technology stack and architecture met the SLAs which were required for the platform. The application was best suited for reactiveness and the entire Lightbend reactive technology stack helped achieving it. Currently this platform is in Beta version and works like a charm. It is saving a lots of human effort while fetching out the relevant data from the research papers.
Get In Touch
If you are looking to build a Reactive Product with Scala, Akka, Play Framework or a Big Data Solution leveraging Spark , Knoldus is here to help. We are a proven, experienced Certified Lightbend Partner, available for partnering to make your product a reality. Get in touch with us here, Follow us @Knolspeak or just send us an email on email@example.com