About Alivia
Our Product
Our Blog
Tue, Oct 2, 2018 @ 22:10 PM

From Text to Insight

What does the following paragraph mean to you, and how did you come to understand it?

I’ve been seeing my current dentist (Joseph Smith) for almost three years. When I saw him in January, I noticed that he put that my visit was 30 minutes longer than it actually was while also adding two X-rays (which I never received). Additionally, he claimed to have extracted my kid’s baby teeth even though they lost them naturally. He’s also added extra visit time to their claims too.

Most healthcare fraud units receive similar complaints about providers on a daily basis, and their staff must decide on an appropriate plan of action. The above circumstance seems like a fairly blatant case of fraud, Joseph Smith is falsely billing for services not rendered, so the fraud unit should likely open an investigation. As a data scientist, I see this manual routing as a potential bottleneck in their workflow – could we teach a computer to do the same thing?

Literacy quickly becomes second-nature after we first learn to read. We often fail to realize the true complexity in comprehension (at least until we try to learn our second language). Many sentences have similar meanings despite varied sentence structure and word choice. Take a rephrasing of the example complaint:

My dentist, Joseph Smith, has been adding additional charges to my routine check-ups. He’s been billing X-rays, which I never received. I think he's been adding more time of care than I actually received as well; I can't remember ever seeing him for more than 30 minutes. I’ve been seeing him for a while and only noticed this after my most recent appointment this January. My children that see him have claims with the same extra procedures that I don’t remember happening and have been billed for primary teeth extractions which never happened as the teeth fell out naturally.

We immediately see the usage of commas to denote the provider’s name instead of parentheses, as well as the splitting out of the extra care and added X-rays into separate sentences (one of which uses a semicolon). Both of these changes vastly alter the sentence structure while still retaining the majority of the meaning. Aside from the structural differences, the word “while” is used as a conjunction (“while also adding”) in the first paragraph and a noun (“for a while”) in the second paragraph. These two usages differ in semantic content as well as part-of-speech. Even despite this complexity, most native English speakers would be able to understand both of these paragraphs easily and likely associate them as denoting the same scenario. This is not a given for most natural language processing (NLP) models.

Understanding unstructured text data is still a largely unsolved problem due to the inherent ambiguity in natural language, but we’ve come a long way in the last 20 years. This particular case (deciding to open an investigation based on a complaint) is largely soluble with Alivia’s current approaches which leverage contemporary deep learning methods. Deep learning with word vectors first translates each word into a list of numbers, where words that have similar numbers at the same position in the list have a similar meaning. Machine-learning is all math at the end of the day, so this translation is necessary to allow for the algorithm to understand the semantic content of each word. We then look at the word vectors as a sequence and process them linearly as the order is important. “The human walked their dog,” for example, means something completely different than “The dog walked their human.” After we process the entirety of the complaint, the algorithm makes a prediction as to if it thinks the complaint should be followed up with an investigation.

More naïve solutions would look at word frequencies (referred to within the data science community as “Bag of Words”). Word frequencies are simply the proportional usage of each word in a given phrase, so words like “the” and “they” are often very common, while words like “fraud” and “upcoding” are generally uncommon. This approach neglects the order of the words entirely and is a less robust measure of semantic meaning. While “primary teeth” and “baby teeth” mean the exact same thing in the two examples, word frequencies would completely disregard this association. The ideal complaint which mentions “upcoding,” “fraud,” and “unbundling” might be caught with this technique, but with many patients struggling to even understand how their coinsurance works, the likelihood that they are familiar with terminology like upcoding and unbundling is slim to none.

The healthcare system and text analysis are both complicated; you need a model that can understand both and make the right prediction. As our model continues to train on your data, it’ll get even better over time, freeing up your employees to dive into the medical records and conduct audits as opposed to read complaints. Ask for a demo today!



Read More
Thu, Sep 6, 2018 @ 14:09 PM

Comparing Apples to Apples

Fraud detection has always been about finding the apple not like the other apples. A subset of general anomaly detection, finding fraud is sometimes as obvious as picking out the basketball-sized, orange colored “apple” and cracking the imposter open to confirm it’s really a pumpkin. But what if a green granny smith apple and a fake red plastic apple both snuck into a batch of red delicious? The former would appear as an obvious anomaly while the latter would require more than visual input to detect as fraud – with overly simplistic forms of fraud detection, like basic outlier detection on paid amounts, you omit this necessary nuance.

Many healthcare payors are currently leveraging various forms of outlier detection as their front-line defense for identifying fraudulent providers. While oftentimes effective, these forms of detection only work if they’re configured properly with the right sample, metrics, and underlying statistical methods. Most forms of outlier detection would flag the pumpkin. It’s a significant outlier in every dimension; apples are not orange, do not have ridges, do not weigh 10 pounds, and are not the size of basketballs.

The other two apples still pose major problems. You want to avoid false negatives (fake plastic red apple) and false positives (green granny smith apple). Without any further context, how can we avoid identifying the granny smith as anomalous? This parallels to comparing outliers for a given procedure without first taking provider specialty into account. Oral surgeons perform various surgeries more often than any dentist, so grouping both specialties together as one sample for outlier detection will falsely flag many oral surgeons as fraudulent, while dentists that perform a surgery more frequently than the average dentist will slip through the cracks. If we split the two groups out, and compare apples to apples, we can detect oral surgeons and dentists that perform surgery at an abnormally high rate when compared to their particular peer group.

But is looking at the rate at which a procedure is performed enough? Most payors are only running outlier detection on a few dimensions, like amount paid and claims count by procedure code. A smart provider might evade these traditional forms of detection by not prescribing an exorbitant amount of Oxycontin, instead splitting their prescriptions fairly evenly between all opioids. But even the plastic apple can’t survive the knife; with more robust and descriptive metrics, you can slice across a broader range of dimensions that take these sly maneuvers into account. Our product Absolute Insight, for example, has metrics for opioids standardized to morphine-equivalent dosage to compensate for this specific scenario.

Finding a proper apples-to-apples comparison with the appropriate metrics for healthcare fraud detection requires extensive domain-knowledge alongside the statistical and computational know-how for implementation. Alivia Technology brings both. Our team of data scientists, all of which have years of healthcare experience, are partnered with leading healthcare experts to continually update our fraud schemes. Our methods take into account the abnormalities of healthcare data distributions and allow you to peer group with a variety of predefined metrics, all with a configurable user interface abstracted for the business user. Find out more on our product page.

Read More
Sat, Aug 4, 2018 @ 14:08 PM

Into the Great Unknown

During World War II, Abraham Wald, a US statistician tasked with improving aircraft safety, proposed a paradoxical solution to boost survivor rates. Unlike fellow researchers who had decided to shore up the security of areas of the aircraft that were often found damaged upon return, Wald proposed they reinforce areas that remained in perfect condition post-mission. He argued that the planes that returned were selected for; particular areas on these planes remained unscathed as any damage to these areas would cause the plane to crash. Damaged areas, however, could suffer impairment without a total loss of aircraft functionality. In what became known as survivorship bias, Wald invoked a common concept underlying all forms of data analysis and interpretation: data has a genealogy and is affected by externalities during the collection process. You can never truly divorce data from its context.

Healthcare claims are an exceptionally biased source of clinical data for this reason. Providers are incentivized to bill for services that will pass adjudication and can record information with this in mind. This oftentimes leads to various forms of fraud, waste, and abuse. If a member is ineligible for a particular procedure that the doctor deems necessary, the provider may change the diagnosis or procedure codes billed to create a covered claim. These data survive the clinical encounter despite being an inaccurate reflection of the visit and provided service. While checking for coherence between all available information in the patient’s health history, encompassing all their claims, can expose some inconsistencies, claims data is ultimately curated by the provider-patient relationship.

Bringing in a variety of other data sources for a more holistic view can help to address this problem and catch bad actors. Here at Alivia, we can use medical records to verify claims, even despite the immense scale of such records. The greater the contextual view, the more comprehensive the data analysis. Wald was lucky enough to have a known, limited set of areas to potentially reinforce on US aircraft. With fraud, however, we only know about and can address the schemes that have been previously documented. In data science, we term these at present undiscovered schemes unknown unknowns. Thankfully, new data-rich APIs are rolled out routinely nowadays, and the current IoT landscape is burgeoning with more and more devices with live data streams. Alivia’s real-time processing can use these new sources to probe into the great unknown, deciphering the truth within your medical claims.

Read More