Scalable analysis is widely used in the domain of cloud computing, yet it is rarely used in neuroimaging in spite of the availability of technologies for handling large datasets. I feel that in the field of Neuroscience, when it comes to MRI and fMRI analytics with large datasets (with many subjects in the experiments), the explosion of imaging data poses new challenges such as data sharing, management, and processing. Most organizations use existing tools that can analyze images for only a few subjects (either patients or volunteers for research projects) at a time. But when they are required to analyze imaging data for a few hundred subjects, how can they scale? How can insights be obtained in a timely manner? Many organizations are coming up with new tools that are being developed for managing complex analysis workflows. But, Google Cloud Platform (GCP) offers an even better and much simpler solution for scaling up the existing tools to handle a large number of subjects simultaneously.
Figure 1: FMRI Pipeline
Existing Pre-processing Approach in Image Processing in Neuroscience
As explained below, datasets undergo various steps (preprocessing) of analysis for neural activity, usually with popular tools like FSL, AFNI and SPM:
- Bad data — Removing large spikes that occur while scanning/capturing images, in other words the spikes on the images are censored out of the data analysis.
- Time slice correction — Images of adjacent parts of the brain are acquired at different time points within a certain TR (repetition time), for example the BOLD-Signal at different layers of the brain is sampled at different time points. Without correcting for the time of acquisition of each slice, the time courses would differ across slices. After the correction, the acquisition has the signal for the whole volume from the same time point.
- Registration — This is alignment of functional and structural images from the same subject to map functional information into anatomical space. The pre-processing steps include reducing the effects of subject head motion with respect to the anatomical reference volume to a standard coordinate space.
- Smoothing — of the images in space is done to reduce noise, reduce the effective number of independent statistical tests that must be made, and to increase overlap with other subjects’ results.
Problem with Existing Approach
When a single subject is taken, the analysis completed by the neuroimaging consultant with the above pre-processing steps takes a lot of time with traditional fMRI tools. When hundreds of such subjects are to be handled, the neuroimaging consultant does not have a way to scale or finish off analysis within a specified time. The platform described in this blog is designed on Google Cloud Platform and will provide a scalable solution that will complete the pre-processing tasks within the stipulated time.
Figure 2: Current Infrastructure
Figure 3: Google Infrastructure
Our pipeline has been designed based on Google Cloud Platform (GCP). Different GCP components were used to develop a solution where the pre-processing of fMRI images was designed to be highly scalable. Our solution comes into play when there are a large number of fMRI images to process, requiring high computation power. As discussed above, the researchers traditionally work on a few subject images at a time. The value add that our solution brings is that a large amount of data can be ingested using the Google Cloud Pub-Sub. The ingested data which consists of multiple subjects’ data is processed in parallel (parallel computing) by a highly scalable set of compute engines with fMRI tools (FSL/AFNI/SPM) available on them. After the processing is complete, the additional metadata (generally textual information about the subject, study time, series number, image size, thickness, repetition time, etc.,) is saved to BigQuery and later used to bring up metadata analytics with Google Data Studio visualization.
The entire data preprocessing pipeline is extremely robust, fast and scalable because of the resiliency, reliability, high performance and scalability of Google Cloud Platform.
The current tools for processing/interpreting fMRIs are able to handle only a few subjects’ (patients’) fMRIs at a time. The new tools being developed to scale up the number of subjects’ fMRIs that can be processed simultaneously are incorporating complex workflows. We have shown in this blog an alternative, much simpler yet very robust and scalable solution using Google Cloud Platform.
In the meantime, if you are looking for Cloud Services – ONAP, OpenStack, Kubernetes, Cloud Native Application, DevSecOps and Infrastructure Modernization, please contact us.
Contributor: Bhaskar Devalapalli