Data Scientists, Software Engineers And The Future of Medicine

The world is changing, especially the way we cure ourselves. The rise of next generation computing, cloud computing technologies, AI, decentralization, etc. have dramatically changed seemingly every industry. Computational Medicine is now an emerging new discipline.

Computational Medicine is a subfield of computation biology, and I feel like with research I'm doing related to age-related macular degeneration, Computational Ophthalmology will become a new field. Other fields as well are emerging out of the rise of modern computing, including Quantun Recursiveness which is a subset of Quantum Cryptography.

Some Background

Currently, I do computational genomics research for a private startup known as Bleunomics. Research I do is mainly centered around data-driven solutions for clinical drug discovery, protein analysis, genomic modeling, etc, in which I build extensive statistical models and utilize technologies such as grid computing in order to achieve optimal results related to certain variables or biomarkers I want to find.

Examples

An example project I have worked on includes the above, which is a project centered on visualized cancer genomic case studies. The above shows the genetic mutations data view in RapidMiner, and the image below shows an experimental data view for donors.

The whole inspiration for my project at the time was:

Imagine being able to mine through hundreds of thousands of cancer case studies at once coming up with a conclusion.

That project was something minimal I was working on for a demo I wanted to submit to DataSci '17. I later made it into a RapidMiner extension, and this is quite minimal considering a further project of mine was Cancer@Home.

Yet a different project I was doing with cancer related data, was visualizing breast cancer related data with Neo4J and GraphXR. The above image shows the default GraphXR dashboard view I have created.

This image above (also related to the Neo4j and GraphXR project), shows the different ways the data have been categorized, color-coded, and tagged. This just shows a very tiny fraction of what is possible for data scientist to do with biomedical data.

Another example of what is possible for data scientist, is what they may be able to accomplish with software such as SnapGene Viewer. A file that I have loaded was the FASTA file of the sequence of AP014690.1, and is a genomic analysis for the organism Asaia bogorensis NBRC 16594. SnapGene Viewer's software allows you to visualize the default genomic sequence of the field, commercially tag sequence classes and change default view settings of your visualization.

Speaking of visualization, now comes even the more advanced challenges. Open source data sets I have been looking into, are ones such as this cohort study seen here. The above visualization was compiled with RapidMiner Studio, and now I can separate the alleles, color code everything and maybe even tag headers in ways that may be similar to a VCF file. VCF files are variant cell format files for containing and annotating genomic variants. What I wanted to do with RapidMiner was create tags based on color for pre-existing data. This was simply accomplished by converting the files for RapidMiner to read in the data, and changing the settings of the graph's visual. This was a small data visualization but brings me to my next idea.

One of my biggest challenges right now is trying to create computational simulations of pre-existing nucleotide sequences. This led me to research distributed computing and generative adversarial networks. A file I was trying to create was a simulation VCF created with freebayes, vcfy, and the decentralized-internet sdk for distributed computing to simulate biostatistical generative adversarial networks. The VCF I was trying to make was utilizing the original FASTA file of the AP014690.1 sequence since I admired that researcher's data set. The end result was here. As you can see, the example file (as of the making of this article) wasn't annotated and I need to figure out a way to annotate the VCF, or standarize a data format in which it could result directly in annotations or modify the generative adversarial pipeline I am trying to make in order to automatically annotate data so it shows in any generated output VCF. Thus far, I have been looking into the Hardy-Weinberg principle for this approach.

The above UML diagram I made, is how I would propose a pipeline for a fully efficient generative adversarial network receiving biomedical data. My idea is to have a distributed computing approach to the generation of synthetic data that is statistically similar to variance in the original biomedical data files. Why could this be useful? Imagine as a medical researcher having a data file of 50 patient donors for clinical trials and being able to extend the data further to 500 donor sets with a high confidence interval.

The process of utilizing biomedical generative adversarial networks may be a key part in the future of medicine given you can create synthetic copies of data. Finding cures from these copies or discovering drug targets may be possible. My research into building just a very basic GAN is still very beta, but there is hope for the future.

This just gives you a small insight into what I do from day to day. I and many computational biologist are working on things quite advanced. For example, I have already discovered a protein that I may want artificially synthesized utilizing data-driven computational techniques. I am also doing research in a variety of different areas hoping to find new insights on age-related macular degeneration.

Other areas of research I hope to utilize, are data-driven analysis related to immune boosters for cancer (many researchers have been trying this for years). I believe advanced statistical techniques will be used in the future for finding missing links for a universally adapted cancer immunology treatment.

The Future

Given the recent technological disruptions we seen, many people are currently even able to mine genomic or biomedical data on their low spec laptops. Data Scientist with a vast understanding of statistics are capable of helping in the clinical drug discovery process in ways that would have been unfeasible a decade ago. Algorithms are making the way medicine is done change, and engineers may become the doctors of the future. In terms of my predictions for the next decade here are just some of them:

Computational Medicine will become a hugely emerging field.
Because of startups like Bleunomics, and what others are doing in analyzing AMD data, Computational Ophthalmology will become a thing, and likely taught in many universities.
Data scientist may make as much as neurosurgeons working in the medical field or as much as they would have made working in hedge funds or any of the financial service markets.
Computer Scientist, Software and Computer Engineers, Quantum Engineers, etc. will start turning in massive #s towards bioinformatics and/or computational medicine.
Data-Driven clinical drug discovery will become more of a mainstream or massively adapted phrase.
At least 25% of all newly created vaccines will be synthetic vaccines that were made as a result of some sort of algorithmic discovery.
The size of medical equipment and need for hospital staff would be greatly reduced due to great disruptions in the industry.

Maybe my predictions will be true. They can also end up being entirely false. What I can say for certain though, is that I feel like this post made it quite clear why us data guys and programmers are the future of medicine.