Deep learning and increasing computational power has allowed for the creation of several systems that are able to diagnose patients based on learning on extensive collections of patient data.

These systems may bring improved diagnostics, but come with their own risks and pitfalls. These risks have to be taken into account when using these systems.

We provide an overview of the existing systems and the potential risks inherent in them.

Big Data in Healthcare

Healthcare is one of many fields where there is a huge amount of data to be processed. Medical journals, research papers, patient records and scans together form a mass of data, which no physician can ever hope to process in totality. This makes it a perfect field for neural-network-based solutions that are able to sift through gigantic amounts of data and find connections and correlations that are invisible to the human eye. A big problem for these solutions is that the data in healthcare is very varied, unstructured and often hard to understand. Understanding the conclusion of a medical journal paper, combining it with patient data and family history data and interpreting a MRI scan of a patient’s brain is a combination of several problems. Each of these problems is extremely hard on its own and combined together, they are nigh impossible.

The main problem is the fact that most of this data is unstructured. Patient records are not databases, with which computers know how to work very well. This leads to problems with understanding natural language found in patient reports, scientific papers and so on. Another big problem is that interpreting images is hard. Interpreting medical images is extremely hard. An AI that would be able to interpret a general medical image and look for any abnormalities is beyond the scope of the field of AI today.

But we're getting there.

Justifying AI Decisions

A major challenge to algorithms based on deep-learning is the fact that neural networks used for deep-learning are from principle a black-box. [1]

A set of neurons, connected in various ways is taught on an extensive collection of data to be able to solve a given problem. After the learning period, we have a vast network with nothing but meaningless values connecting the individual layers of neurons. This is the major difference between structural programming and deep-learning. With a structural program, we can follow a set of instructions which make sense to a human (or at least some humans) and debug why a program gives a given output for an input and check all the calculations in between.

Even Deep-learning systems that perform their given action extremely well cannot explain why they give the provided answer. And almost no debugging is possible, we can see which neurons were activated and with what intensity, but this will most likely not give us any explanation to the rationale behind the choice. The thing is, there is no rationale behind the solution, it's "black magic" and the neural network is 100% dependent on the learning data set that it was taught on. Adding a single image to the learning dataset of millions can and will influence the weights between the neurons and thus change the classification of a potentially infinite number of real world inputs.

This is a major limitation because of two issues - sociological and legal. People expect machines to be able to explain their reasoning, even when in the same situation a human wouldn't have to, or a very general explanation might suffice. We simply trust humans more than machines, even though we don't have data about error rates of humans (and these might be high indeed).

As mentioned in [1], there are efforts to make AIs more explainable, but none have succeeded so far as to be fully usable. In the image analysis domain, a potential way to do this is to have a series of training images in which we have added metadata about the images, which give us an explanation why the object should be classified as such. Then seeing the neuron activation patterns in the neural network, we might be able to say that certain neurons activate when given an image with certain characteristics. This method adds a great burden on the learning data set and hasn't been implemented yet (as far as we know).

Also mentioned in [1], whenever we as humans explain our reasoning, we always omit some information. Since we ourselves don't fully understand how our brains work, we cannot explain our actions perfectly. Prof. Clune says: “It might just be part of the nature of intelligence that only part of it is exposed to rational explanation. Some of it is just instinctual, or subconscious, or inscrutable.” and "Even if somebody can give you a reasonable-sounding explanation [for his or her actions], it probably is incomplete,".

The resulting question is: "To what level do we have the right to ask for explanation?". We would probably be happy, if the DNNs are able to explain at least as well as a human would. This is a major feat and an AI that can do this does not currently exist. Also, measuring a quality of an explanation might be a very tricky metric.

One could argue that human brains are no different from a very large neural network and that we cannot provide an explanation for why we think what we think. The difference is that unlike the DNNs which are trained for a specific task, we have also been trained to give explanations that other people find satisfactory.

Try explaining why you are certain that when you see a wallet that it's a wallet. Wallets have different sizes, are made of different materials, they contain different objects. You probably might be able to put together some criteria that will describe a wallet in absolute specifications, but they are going to be cumbersome and you'll miss a lot of wallets and get a lot of false-positive wallets from all wallets in the world. And isn't your explanation of what a wallet is going to be influenced by the wallets you've seen in your life? Isn't an explanation "I've seen a thousand wallets and this looks like a wallet to me." sufficient? Would it be sufficient from a machine?

Legal issues of AI not being able to explain itself

As of mid-2018, in the European Union, the new General Data Protection Regulation will come in effect. Aside from other changes, it adds a "right to explanation" - citizens will now have the right to question and fight decisions that affect them that have been made on a purely algorithmic basis. Citizens will be able to ask how and why a decision about them was reached. This would be impossible with current deep-learning algorithms, as an explanation would have to show all data the network was taught upon, the logic behind the weighting of neurons et cetera.[2]

There is debate to what extent this right actually exists in the new legislature - for example an analysis in [3], disputes this interpretation and instead concludes that meaningful information about the logic involved in the decision be provided and the data that are used for the decision be specified. It is argued that there is actually no need for detailed explanation of how the system reached a single given decision for a single case.

Whatever the case, this might not be directly relevant to AI diagnostics, as the responsibility currently lies only on the clinician, with AI diagnostic legally only providing additional information and advice, not making the actual decisions. It may have more of an impact on the insurance business, especially in the US, where private insurance companies do use AIs to determine cost of insurance for people. But then again, in the US, the EU legislature will have little effect.

Even if there is currently no legal requirement to show the reasoning of DNNs, it raises questions about the need for a self-explanatory AI from a legal perspective.

Deep Learning counter-examples

An interesting field in deep-learning is finding counter-examples for which a neural network does not classify an input correctly, even though it would be obvious to a human.

This is most obvious with image classifiers, as shown in [4]. Small perturbations to an image, on the scale of mild noise in the image can throw off practically any deep-learning network and make it misclassify an image that it would otherwise classify correctly. These small changes are practically invisible to humans, but cause an error in the DNNs classification.

On the right, you can see a set of images taken from [4]. On the left is an image that is correctly classified by the classifier as the object that is depicted, on the right, an incorrectly classified picture and in the middle is a difference in pixel values between the images. The worrying thing is that given enough trials, such counter-examples can be found for any deep-neural-network. The existence of such counter-examples shows that even though deep-neural-networks are great at generalizing the features found in images, they can be fooled by extremely small differences, even when the learning set of images is deemed sufficiently large to be a general representation of a given object.

The generalization of this problem is obvious also in healthcare. Minuscule changes in patient records which wouldn't affect a clinician's decision about a patient's diagnosis can make an AI misdiagnose a patient. This unpredictability of DNNs is seen as a major drawback for their application in critical decisions.

IBM Watson

The most well-known AI system today is the IBM Watson. It has gained popularity in 2011, when it obliterated previous champions in the Jeopardy! quiz game.

This achievement proved that Watson is able to [5]

Correctly interpret written sentences and understand what it is asked in a natural language
Understand what it is being asked in context.
Search its database for cues as to what the answer inferred by the given question might be
Form several hypotheses and use hundreds of deep-analysis algorithms to evaluate the confidence in each one
Select the most likely one and answer in natural language

For this, Watson uses IBM's DeepQA (short for Deep Question Answering) architecture. DeepQA is a Deep Natural Language Processing system, composed of several dozen modules. These modules first pre-process the sentences given to Watson and interpret the question.

Another module looks for the possible answers inferred by the question by going over the available data combined with a deep-learning neural network. After several possible answers are found, hundreds of different scoring algorithms (also deep-learned) go through the whole data corpus and rate the confidence in the correctness of each answer.

After all these answers are rated, another neural network weights the scores, based on each scoring algorithm performance when given similar questions in the learning phase. A weighted average is performed, and finally, the most likely answers are converted back to natural language. [5]

Watson can provide textual support evidence of why it came to a given solution, but like any deep-learning machine, it cannot say why it believes that a given piece of evidence proves the hypothesis other than "It worked in the learning phase".

The most interesting thing about Watson is that (like all deep-learning solutions) it is very general. Depending on the data corpus provided to it and given a learning period in the given area, it can be taught to work in any given big-data driven field. This has led to it being used in finance, insurance and most importantly, healthcare.[5]

Watson winning in Jeopardy! - Credit [6]

How Watson answers questions - Credit [5]

Amount of Information - Credit IBM

IBM Watson Health

As mentioned in the introduction, health data are unstructured, vast and written in natural language. This led to IBM creating a Watson Health division in 2012.

In its current version, Watson works only on textual data, so it cannot incorporate image scans into its decisions, creating a dependence on a clinician (or another deep-learning solution) to extract information from scans into a textual form. IBM is, of course, working to teach Watson to analyze images too.

IBM Watson for Oncology

Watson for Oncology has been created by providing the basic Watson with a corpus of knowledge containing thousands of research papers, journals and textbooks and given extensive training from a team of clinicians. It seems able to perform very well in recommending treatment for patients.

IBM Watson for Clinical Trial Matching

Watson Oncology also allows for matching patients to clinical trials. As requirements for clinical trials are usually very specific, checking whether each patient is eligible for any trial would be extremely time-consuming for the clinician. Watson can take processed patient information and provide information about trials that the patient might be eligible for.

Deep Patient

Again, a deep-learning solution that has been given access to patient's Electronic Health Records, taken from Mount Sinai Data Warehouse, containing data about ~4 million patients. After preselection of data, the system was taught on ~700,000 patients. This shows just how much data a deep-learning solution can process compared to a human clinician. After being taught, the system displayed extraordinary ability to predict newly diagnosed diseases.

First, unstructured data was preprocessed to provide standardized annotated labels from patient free-text records. Unlike Watson, this was done by Shallow Natural Language Processing, which is less computationally intensive but provides less precise information about the sense contained in the text.

Data from the last year before implementation was held back from the learning algorithm and served later as test data, to compare actual newly diagnosed diseases in patients with the system's prediction.

This is achieved by first processing the EHRs into a set of strong features unique for each patient. Each patient was described by a vector of 500 values between 0 and 1.

For the actual disease prediction, a random forest classifier was created for each disease based on the patient features of patients that displayed the disease. This random forest was then applied to each patient vector and every patient was assigned a probability of developing the given disease.

Results were consistently outperforming other deep-learning techniques in almost all disease categories. This seems to show that first preprocessing patient data into deep-vector representation can have great benefits over approaches that do not do perform this step. [7]

Bibliography

1) The Dark Secret at the Heart of AI, https://www.technologyreview.com/s/604087/the-dark-secret-at-the-heart-of-ai/ (access 15/07/17)

2) Goodman B., Flaxman S. (2016) European Union regulations on algorithmic decision-making and a “right to explanation”

3) Wachter et al. (2016) Why a right to explanation of automated decision-making does not exist in the General Data Protection Regulation

4) Szegedy et al. (2013) Intriguing properties of neural networks

5) Rob High, Copyright IBM Corp. (2012)vThe Era of Cognitive Systems: An Inside Look at IBM Watson and How it Works, Rob High, Copyright IBM Corp., 2012

6) New IBM Watson Data Platform and Data Science Experience, http://www.jenunderwood.com/2016/10/26/new-ibm-watson-data-platform-data-science-experience/ (access 15/07/17)

7) Miotto, R. et al. (2016) Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Sci. Rep.

Seitenhierarchie

Group 4: AI Diagnostic System Overview