Accelerating Investigator Recruitment with NLP

10 mins read

NLP and social graphs are effective instruments to speed up investigator recruitment and ensure timely pharmaceutical product releases. Read how NLP can leverage the power of unstructured data and help find doctors-influencers that can substantially provide assistance to a clinical trial.

Igor Kruglyak is a Senior Advisor at the global IT service provider Avenga and a 35+ year veteran executive of global key development and deployment projects.

Michael DePalma is the Founder and President of Pensare, LLC; Co-Founder of, The Human API, as well as the holder of 3 US patents and a 2-time TED-speaker.

In fact, patient recruitment already is the largest cost driver in clinical trials. One reason for this is that 86% of clinical trials fail to meet their patient enrollment deadlines within the set time frame. Many clinical trial sites can source only a few patients that qualify for the trial’s inclusion criteria, as stated by Industry Standard Research.

As the total number of clinical trials increases every year, there is a corresponding increase in the number of patients needed to enroll in them. According to the Research and Markets report, the recruitment and retention market in this field is estimated to reach $5.3 billion by 2030, compared to $3.6 billion in 2020.

To address the issue of low patient recruitment rates, a proven solution is to work with physicians and identify influencers who can help with patient enrollment. But how do clinical organizations do this efficiently and ensure the timely market release of a pharmaceutical product? The answer lies in effective medical data analysis.

Information Is Not Knowledge: How to Leverage Unstructured Medical Data

Every day, the pharmaceutical industry produces volumes of unstructured data, coming from medical devices, wearables and sensors, health records, publications, articles, surveys, and so on. Yet, 80% of the medical information remains unstructured after creation. As if this was not enough, IDC research states that the amount of data generated by healthcare will grow at a Compound Annual Growth Rate (CAGR) of 36% through 2025.

Natural Language Processing (NLP) can help structure the data and prepare it for analysis. A subfield of AI, NLP is a way for a computer to draw out meaning from text, both written and oral. It consists of information retrieval and extraction, lexical and semantic analysis, pattern recognition, tagging, and data mining.

These techniques can be applied to different types of documents, from e-health records and lab reports to healthcare regulatory specifications and scientific publications. With the help of optical character recognition (OCR), NLP can even be applied when data is stored as an image. OCR is used to digitize the scanned texts from images and make it electronically editable. After the images and typewritten documents are pre-processed with OCR techniques, NLP can further cluster the texts and build connections between the entities and objects in the dataset.

How NLP Works with Unstructured Medical Datasets

There are six NLP techniques that are usually applied to medical data sources like PubMed,, Google Scholar, and anonymized records from medical practices:

  • Named entity recognition detects objects within a dataset, such as doctors’ names, biomolecular targets, organic molecules and compounds used in drugs, as well as other entities like locations, phones, etc. As a case in point, named entity recognition can enumerate the doctors’ names that specialize in certain medical fields.
  • Semantic parsing performs the syntactic analysis of texts. After the parts of texts are identified with the help of named entity recognition, semantic parsing helps to convert natural language into logical forms and determine the meaning of every text part and its relationship to each other.
  • Topic modeling extracts topics from the text. It allows for the automatic detection of relevant topics in the dataset by observing the words used. Topic modeling enables the ability to figure out hidden patterns in a dataset and summarize, understand and categorize big amounts of data.
  • Keyword extraction determines the set of essential keywords and keyphrases used across the dataset. It provides researchers with a set of data points that can help to better understand the unstructured medical information.
  • Document summarization produces a short description of different units across the dataset. The unit can consist of sentences, paragraphs, one or several documents. The key goal of document summarization is to define what sentences express the essence of the unit, providing the vital information for understanding the unit.
  • Relationship extraction determines relationships between entities in a dataset, based on predefined criteria like biomolecular targets, doctors’ names, locations, etc. The relationships between text entities are used for creating investigator directories, pharmaceutical compound searches, intelligence analysis, etc.

Combined with a social graph, these techniques can be utilized to map the most influential authors and doctors with previous experience in clinical research.

Today’s technology, in the hands of experienced software providers, can establish a more efficient method for CROs and pharmaceutical companies to enroll patients in clinical trials. Advanced NLP techniques and machine learning algorithms that are applied to unstructured medical data are the solution to significant and pervasive problems with patient recruitment and timely product release.

Clinical Trial Challenges That Social Graphs Help to Solve

The social graph is a complex data structure that contains relationships between different objects, including people, pages, events, photos, etc. The most famous social graph is the Facebook social graph, connecting over 2.7 billion monthly active users. Similar social graphs can be created for different industries and organizations, including the pharmaceutical industry.

In pharma, a social graph can be used to visually depict all the connections between investigators, patients and clinical organizations. The impact factor measurement helps to assess who are the most influential doctors in a social graph. Impact factor measurement detects the number of references to a particular author and their relative weight, and then assigns a numerical weight to each data source, thus identifying influencers.

This allows sponsors to invite investigators that have researched particular topics and might produce a substantial contribution, even if they have not been involved in similar previous studies. In turn, this increase of market reach can lead to higher patient engagement rates and, as a result, speed up the challenging clinical trial process.   

Practical implementation of NLP and the social graph

AI has already proven to be helpful in many medical fields. For example, the enablement of faster diagnosis and tracking of disease progression by using pattern recognition and segmentation techniques on medical images like retinal scans. Now, to automate processes, gain deeper insights, reduce risks and lower costs, plenty of the leading pharma companies have started to integrate NLP into their everyday work.

In the clinical domain, researchers have used NLP systems to identify clinical syndromes and common biomedical concepts from radiology reports, discharge summaries, problem lists, nursing documentation, and medical education documents. Different NLP systems have been developed and utilized to extract events and clinical concepts from text … Success stories in applying these tools have been reported widely,” state the authors of an article, which reviews research on clinical information extraction applications that was published in the Journal of Biomedical Informatics.

A pharmaceutical manufacturing company that has utilized NLP techniques combined with a social graph to identify a database of key opinion leaders that can be engaged for clinical research and marketing activities to speed up investigator recruitment, is Avenga’s customer QPharma.  “Affordability and depth of expertise have made Avenga a critical development partner. Their team easily scales to accommodate project size and is equally flexible with scheduling across time zones”, says their CTO Suhail Mughal. Success stories like this one show that NLP and social graph technology can be effective instruments in increasing the chances of successful and timely patient recruitment, and accordingly, clinical research.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Previous Story

New Sales Directors at Romaco in Cologne, Bologna and Karlsruhe

Next Story

Novartis Piqray(R) – First and only treatment specially for patients with a PIK3CA mutation in HR+/HER2- advanced breast cancer receives HSA approval