Abstract
With the movement of magnetic resonance imaging (MRI) technology towards higher field (and therefore frequency) systems, the interaction of the fields generated by the system with patients, healthcare workers, and internally within the system is attracting more attention. Due to the complexity of the interactions, computational modeling plays an essential role in the analysis, design, and development of modern MRI systems. As a result of the large computational scale associated with most of the MRI models, numerical schemes that rely on a single computer processing unit often require a significant amount of memory and long computational times, which makes modeling of these problems quite inefficient. This paper presents dedicated message passing interface (MPI), OPENMP parallel computing solvers for finite-difference time-domain (FDTD), and quasistatic finite-difference (QSFD) schemes. The FDTD and QSFD methods have been widely used to model/analyze the induction of electric fields/currents in voxel phantoms and MRI system components at high and low frequencies, respectively. The power of the optimized parallel computing architectures is illustrated by distinct, large-scale field calculation problems and shows significant computational advantages over conventional single processing platforms.
1. Introduction
Recent progress in MRI superconducting technology has lead to a considerable increase in human exposure to very large static magnetic fields of up to several Tesla. A typical commercial MRI scanner has a main field strength between 0.5 Tesla and 4 Tesla (in the region of uniformity). However, MRI systems above 4 Tesla, such as 7 Tesla and 9.4 Tesla, are currently used for research [1–3]. Motion of patients and occupational workers through these fields can cause the induction of currents in the body [4, 5]. In addition, new imaging sequences demand larger amplitudes and switching rates in magnetic field gradients, thus increasing the likelihood of peripheral nerve stimulation (PNS) [6–9]. When radiofrequency fields are transmitted to excite a spin ensemble during imaging, electromagnetic energy is coupled to the tissue and deposited within, which can cause regional temperature elevations within the patient, thus leading to possible tissue/cell injury [10, 11]. Apart from interacting with the patient, electromagnetic fields produced by the system also couple to the conducting materials in the MRI system to induce eddy currents that degrade image quality [12]. In practice, it is difficult to measure the unperturbed fields/currents in the human numerical prediction of the induced fields, can then be very useful for the evaluation/design of electromagnetic devices. The aim of this study is to improve the performance of the conventional (quasistatic) finite-difference method, which has been extensively used to model the induction of electric fields/currents in inhomogeneous voxel phantoms of man and animal when exposed to oscillating or static magnetic field sources. The method has the advantages of simplicity and can model any shaped object and various excitation sources.
Numerical computational techniques are indispensable components for analyzing, predicting, and obtaining approximate solutions to the real world problems. The finite-difference time-domain (FDTD) method is well suited to high-frequency electromagnetic analyses due to its simplicity and efficiency in wave modeling and the ability to handle field-material interactions and nonlinear phenomena [13]. In contrast, the quasistatic finite-difference (QSFD) method is ideal for linear low-frequency problems with time-harmonic dependency. One example of this is the modeling of the interaction of magnetic fields due to MRI gradients and main superconducting coils with the patient or occupational worker [14–16].
In recent years, we have developed a series of single processor-based finite-difference schemes [14–24], which can be used to simulate a wide range of problems related to MRI and other environments. Our long term aim is to generate a complete temporal model of an MRI system including all the field generating units, passive system components, and an electrical, voxel model of the patient. A better understanding of the complex temporal interaction of the fields within the patient and the system itself can provide useful insight into system design.
Often, numerical modeling of electromagnetic field—material interactions in MRI requires high-spatial resolution, which imposes a significant computational burden. In solving large-scale-high-resolution (LSHR) models, single processor-based methods are limited and often incapable of managing the required large memory and computational time requirements. Therefore, to improve the performance of these finite difference solvers, it is necessary to further explore computational strategies such as parallelization. Parallel environments such as the message passing interface (MPI), OPENMP, and LAM/MPI have been used in large-scale computational problems [25–27]. Due to the geometrical topology of these problems, finite-difference methods and related hybrid algorithms are very adaptable to parallel computing frameworks, in which the computing task is divided and assigned to many processors with distributed or shared memory allocations. This paper outlines MPI and OPENMP parallel computing solvers dedicated to the FDTD and QSFD method, respectively. In the case of the QSFD scheme, a parallelized conjugate gradient technique is applied to solve the sparse matrix equations typical in these situations. A variety of LSHR problems has been investigated using parallel schemes. Compared to previous single processing algorithms, the proposed parallel platforms offer significant advantages in terms of improved numerical convergence and reduced computing time/memory usage. Some of the problems have been solved at high-spatial resolution, which is beyond the capability of single processing methods.
2. Methodology
2.1. Parallel Cartesian and Cylindrical FDTD Methods
Maxwell’s equations can be expressed in Cartesian or cylindrical coordinate systems with components of the magnetic and electric field intensity formulated in a general form [28], as given by the following expressions: where and for Cartesian (cylindrical) coordinates . The variable is the radial displacement [in meters] employed in the cylindrical FDTD method. For the Cartesian FDTD method, for all values of . In the discrete FDTD computational domain, the dimensions of the Yee cell are defined by and . The relative permeability , relative permittivity and material conductivity are assigned at the center of the Yee cell. Parameter signifies the impressed current density [in ]. The FDTD update coefficients are space-dependent and can be written as follows: where is the angular frequency. The abbreviations RF and LF denote radiofrequency and low frequency, respectively. The parameter is the FDTD time step, which in general coordinate notation, is limited by the Courant-Friedrich-Levy (CFL) stability condition where and are the permeability and permittivity of free space, respectively, is the safety coefficient that ensures numerical stability, and is the minimal radial component to be studied (here: ). In the Cartesian coordinate system, . During the updating of the E-field components, the coefficients are linear for the RF case and nonlinear for modeling exponential decay of propagating waves inside good conductors at LF. In LF applications, the conventional FDTD method suffers from prohibitively long computational time [20]. The weakly coupled-Maxwell’s equations can be adapted to the low-frequency regime by downscaling the speed of light constant, which permits the use of larger time steps while maintaining the validity of the CFL stability condition. In our recent study, this modification was accomplished by scaling up the permittivity of free space by a scaling factor α: . It is clear that when , the conventional FDTD methodology for RF applications is obtained. In the cylindrical formulations, the numerical singularity associated with the polar axis needs to be considered. A series expansion provides an approximation in the radial direction that satisfies regularity conditions. The combination of cylindrical and Cartesian FDTD methods for both LF and HF applications can be particularly powerful. For instance, the cylindrical FDTD method is particularly suitable for modeling of problems with cylindrical symmetry such as the generation of eddy currents in a conducting cryostat vessel during magnetic field gradient pulsing or the interaction of EM fields produced by radiofrequency resonators and surrounding system materials. The Cartesian FDTD method allows one to evaluate the electromagnetic wave propagation through tissue-equivalent body models. These FDTD algorithms can be easily adapted to a parallel architecture with the MPI library. With its reputed stability, optional communication routines, and robust compatibility, MPI is considered to be one of the prominent environments to support parallel computing.
In FDTD, the problem space has to be specifically defined before the field calculations commence. In the parallel framework, the computational space is divided into several domains of approximately equal size (see Figure 1) and each domain is managed by one process. All processes run on several processors and execute the same program, while each process operates on memory (computational) domain that is specifically allocated to it. Therefore, mutual access to the memory among the parallel processors is often not straightforward and requires management. With FDTD, additional memory can be assigned to boundary regions of computational subdomains to maintain a smooth transition between neighbouring EM fields on these interfaces. In addition, communication routines can be applied between the neighbouring processes to transmit the E- and H-field components with the consideration of the half time-step difference of these two fields to maintain unified electromagnetic field propagation throughout the entire computational space. Within the MPI frame, a load-balancing scheme is often desired to manage the task distribution in the processing network to improve the computational efficiency.

2.2 Parallel BiCG Method for QSFD Scheme
At low-source frequencies, where the dimensions of the material medium are small compared to the wavelength, the induced fields can be treated quasistatically. According to Faraday’s law, the electric field in a conducting sample is induced by a time varying magnetic field due to a source. In terms of electric and magnetic potentials, the total electric field inside the body model can be split into primary field and secondary field , according to Here, denotes the vector magnetic potential due to the source, and is the scalar electric potential. The primary electric field causes a flow of current in a conductive sample with the conductivity . Any conductivity differences along the path of the current, including the air-tissue interfaces cause nonuniformity of accumulating electric charges, giving rise to scalar potential , the negative gradient of which is the secondary field , which causes a flow of current .
Based on our implementation of QSFD method, the computation of the electric fields is given by the governing surface integral equation where represents the surface of the integral region and is the conductivity of the sample. The governing equation (5) is subject to a boundary condition of the Neumann type: which indicates that the normal component of the induced eddy current on the tissue-air boundary is zero inside the body. To calculate the scalar potential , we divide the computational space into a large number of cubic cells and then (5) is approximated for each elementary cell. After discretization and rearrangement, the scalar potential for cell can be expressed as in which subscripts and indicate the cell indices, subscript indicates two faces (0 or 1) in directions, respectively, is the unit vector normal to the cell faces, is the cell size, and is the local harmonic averaged conductivity. To build the matrix formation, we transform (7) into Equation (8) can then be expressed as a linear relation in the matrix form of , where where is the coefficient matrix, is the scalar potential solution we seek, and is the known source vector matrix of form is the number of unknowns and appears here as matrix dimension. Assuming the problem scale is , all 3-dimensional elements are indexed as . Here, represents the linear coefficients between elements and , and the matrix elements for row (see (9)), for instance, are defined as follows: The biconjugate gradient (BICG) method is useful here as the system matrix is nonsymmetric and nonsingular. To solve the linear equations, a row-indexed compact storage method [29, 30] has been utilized for the coefficient matrix . With the storage of about twice the number of nonzero matrix elements, compared to other methods requiring as much as three or five times. This scheme also provides sufficient auxiliary information on the matrix and efficient matrix operations using continuous memory. In this way, only nonzero elements have been processed. According to (9), coefficients are also assigned to air elements that neighbor elements of the problem space (i.e., those that have nonzero conductivities), thus resulting in sparsity of matrix in this linear system. BICG requires computing a matrix-vector product and a transpose product. In certain applications, the latter product may be impossible to perform, for example, if the matrix is not formed explicitly and the regular product is only given in operation form, as a function call evaluation. In a parallel computing environment, the two matrix-vector products can be performed simultaneously. However, in a distributed memory environment, there will be extra communication costs associated with one of the two matrix-vector products, depending upon the storage scheme for . A duplicate copy of the matrix alleviates this problem, at the cost of doubling the storage requirements for the matrix. A shared memory parallel scheme OPENMP is applied in this case. Without having duplicates on each subprocess, multiple threads compute on one allocated memory. In that way, one can control the number of processing threads to maximize efficiency.
2.3. Practical Applications
2.3.1. The RF FDTD Application Examples
(a) Rotary Phased Array Head Coil—Spherical Phantom
To demonstrate the performance of the parallel FDTD
method, we investigate a new type of receive-only, 4-element rotary phased
array (RPA) head coil that is designed to improve the sensitivity
profile at the center of the sample. Shown in Figure 2(a) is the modeled RPA
head coil loaded with a spherical phantom. To mimic a
simple head model, it was
assumed that the FDTD-modeled spherical phantom has a conductivity of 0.616 Sm−1 and permittivity of 48.8 Fm−1 [31]. Each coil element has a “paddle-like” structure
consisting of two main conductors. The main conductors of each coil element
will carry equal currents but in opposite directions, as a result, each coil
element will produce a plane of maximum sensitivity along the axis of the
cylindrical space. That
is, the plane of sensitivity of each coil element will cut radially or
diametrically through the cylindrical space thus improving the sensitivity at
the center of the head coil. The method of moments (MoM) available from a
commercial software package FEKO is firstly used to model the RPA head coil and
also to ensure that coil elements are decoupled, tuned, and matched to 85 MHz.
In addition, MoM is used to calculate the initial Huygence equivalent surface
sources. Once the initial Huygence equivalent surface sources are obtained and
mapped onto FDTD discrete domain, the iteration process commences and is
carried out until convergence is achieved.  fields at the mid-section
of the spherical phantom, corresponding to each individual decoupled coil
element of the RPA head coil, are calculated and thereafter used for
calculating the signal intensity (SI) profiles. The calculated SI profiles are used for comparison with the MR images obtained
from the prototype RPA head coil. After obtaining positive simulation results,
a prototype RPA head coil was constructed. Shown in Figure 2(b) is the
constructed prototype RPA head coil loaded with a spherical phantom. The
prototype RPA head coil was tested in a Bruker S200 2T whole-body MRI system,
equipped with four receiver channels. Using a MSME pulse sequence with TR =
1000 msec, TE = 19.3 msec, and  axial slices located at the mid-section
of the spherical phantom were acquired in parallel by each coil element of the
RPA head coil. For cylindrical FDTD modeling, both the conventional single processing
and the new parallel FDTD methods were employed and their computational
performances compared.
The parallel cylindrical FDTD method was implemented
on a 3-server cluster network, each with 2 XEON 3.6 GHz processors and 4 GB of
memory, using an MPI library and managed by a MPICH parallel computing
platform. In contrast, the single processing FDTD framework was evaluated on a 3 GHz/1 GB
RAM single-CPU machine, which presents identical computing performance on
servers. Due to multiuser stimulation, it is very hard to find the best
environment to test the single processor FDTD simulation in our server, so we
reported the comparison in this way.

(a)

(b)

(c)
(b) Birdcage Resonator—Human Body Model
To further demonstrate the enhanced computing power,
we have subjected the parallel Cartesian FDTD algorithm to a very high-spatial
resolution of 1 mm. For this purpose, the tissue-equivalent male whole-body
models from the Brooks Air force Database [32] at spatial resolutions of 1 mm
and 4 mm were chosen. The models were subjected to radiofrequency fields
generated by a 16-rung whole-body birdcage (volume) resonator operating at 340 MHz
(8 Tesla). The dielectric properties of all body-identified tissue types are
frequency scaled and kept constant at the designated frequency [33]. At
high frequencies, it is impossible to have a global uniform flip angle in the
whole imaging slice, field focusing scheme is therefore implemented. The input
power of the birdcage resonator is numerically adjusted (a scaling is done to
the fields) to provide an averaged 90-degree flip angle in the subregion of the
abdomen and chest, such as the heart. The space domain enclosing the birdcage resonator, perfectly
matched layers (PML) and the human phantom was modeled with  cells
at 1 mm and  cells at 4 mm resolution. After the steady-state is obtained,
the specific absorption rate (SAR) is computed using the following equation: where  is the coordinate of the voxel in designated
coordinate system,  is the conductivity of tissue  is the peak recorded electric field intensity , and  is the specific mass density of tissue .
The models were implemented on an SGI high-performance
computer (HPC) with 10 computing nodes, where each node is equipped with 2
Itanium 2nd CPU 1.5 GHz and 4 GB of memory. The 1 mm- and 4 mm-resolution cases were
performed using 10 and 5 computing nodes, respectively.
2.3.2. The QSFD Application Example
When MRI operators (radiographers or MR technicians) attend anxious, sedated, or intubated patients during an MRI exam, they can be repeatedly exposed to switched magnetic field gradients near the bore entrance at the imager end [5]. In this study, we use the QSFD-BiCG method to evaluate the exposure of an MRI occupational worker in the vicinity of the MRI machine. In particular, the computational performance of the single processing versus parallel QSFD method is assessed.
(a) Body Models
Anatomically, accurate
whole-body voxel models of male (BROOK) and female (NAOMI, [33]) are used to
represent the occupational workers. The female model was engaged at different
voxel resolutions to study the computational performances of the single processing
and parallel QSFD method. The scales of the female
model in Cartesian dimensions were  and  (in , and ) at model resolutions of 2 mm,
4 mm, 6 mm, and 8 mm, respectively. In a computationally extreme case, the male model was
studied at an ultra-high resolution of only 1 mm, whereby the scales of the
model in Cartesian dimensions were . The experimentally-determined
conductivity values by Gabriel et al. [35] of some forty body-identified tissue types were aptly
scaled to the frequency of interest and assigned to the appropriate body
voxels.
(b) Gradient Coils
An
actively-shielded whole-body assembly, consisting of the -, -, and -axis
gradient coil is used to evaluate the electric fields and current densities
induced in the worker during gradient switching [12, 35]. For mere
demonstration purposes, all three gradient coils have been designed to have
approximately same axial length of ~1.4 m, yet to remain radially separated, that is, the six coil layers (primary and
secondary) are allocated to different radii assuming a layer thickness
(including former, etc.) of 5 mm. The length of the gradient set is assumed to be
of the same length as the imager including the cryostat vessel. Table 1 lists
some important parameters while Figure 3(a) illustrates the geometries of the
three gradient coils (note that the -gradient coil is identical to the
-gradient coil when rotated by 90 degrees). It is assumed that each gradient
coil generates a normalized gradient field strength of 1 mT/m in the working
volume, and that the gradient coil currents are pulsed trapezoidally at the
frequency of 1 kHz and 100 microseconds rise time.

(a)

(b)
(c) Computational Setup
Figure 4(a) illustrates a sketch of the position and orientation for
both body models near the gradient set end outside the MRI machine. An
iterative method based on the SOR algorithm [14–16] and a new parallel
BiCG method developed in this study are engaged to compare the numerical
convergence and computing performance. All numerical modeling with the parallel QSFD-BiCG method were performed on the aforementioned SGI
HPC with 10 computing nodes. Ten subthreads are assigned by the OPENMP scheme.
The single processor simulations were performed on the same computing platform
with only one computing node assigned.

(a)

(b)

(c)
3. Results and Discussion
In our experience, numerical studies that involve high-resolution (~1 mm) models of human head, pelvis, or other parts of the body have been published. There is, however, no (or lack of) work that involves high-resolution whole-body models, such as 1 mm. This work reports interesting results involving a 1 mm resolution whole-body model, which previously was not possible on single CPU platforms.
3.1. Example RF FDTD Applications
(a) Current Carrying Loop—Spherical Phantom
Shown in Figure 2(c) are the SI profiles obtained using the MoM/FDTD method and experimentally acquired
MR images of the spherical phantom using the prototype RPA head coil. It can be
observed that both numerical and experimental results agree well and that the
sensitivity at the center of the spherical phantom, as anticipated, has been
improved using the RPA head coil. Both the single processing and parallel FDTD
methods require around 100 MB RAM and yield identical results. The computing
time using the nonparallelized FDTD method was around 182 seconds. In contrast,
the parallel FDTD method is around six times faster than the conventional
scheme.
(b) Birdcage Resonator—Human Body Model
Figure 5(a) shows the model setup involving body and the birdcage resonator. Figure 5(b) compares the specific absorption rate (SAR) at the body-model
resolution of 1 mm and 4 mm. Based on the current hardware settings, the
whole-body model at 1 mm spatial resolution required approximately 28 GB of
memory and 20 hours of computing time for 6000 FDTD iterations to get converged
results. The low-resolution simulation (4 mm) required around 500 MB of memory
and 1.8 hours of computing time. Compared to spatial SAR distribution at 4 mm,
the 1 mm resolution model provides us with clearly more detail and better tissue
specificity.
In terms of problem scale, the 1 mm resolution
case is around  times larger than the 4 mm simulation. To each
processing node involved, the time consumption for 1 mm processing is only about
11 times larger than for the lower-resolution case, which illustrates the
higher-computational efficiency of the parallel scheme.
The single processing FDTD routine at 1 mm
resolution could not be implemented due to the very large amount of memory
required. In this sense, the parallelism cannot only improve the computational
efficiency of FDTD, but enable it to solve those problems that conventional
FDTD cannot. In the parallel FDTD computing framework, data communication
occupies much less time than the computation of E-H field components. Thus,
implementations with extensive task distribution can improve computational
efficiency, but the additional communication overhead may affect the overall
performance. In theory, the time complexity is inversely proportional to the
number of parallel threads, provided there is enough random access memory (RAM)
and CPU processing resources. With the inevitable communication overhead
imposed on the individual processes, however, execution efficiency is often
significantly diminished. A load-balancing scheme is applied in the parallelization to reduce the latency time between subprocesses which eventually
alleviates both the data conflicts and iterative latency in the data
transmission. This parallel FDTD scheme has remarkable computing advantages
over conventional single
processing methods, which could be also seen in many other parallel computing
scenarios though in different applications [25–27].

(a)

(b)
3.2. Example QSFD Application
Figure 4(a) depicts the typical posture of the radiographer on the side of the patient bed, which was used in the QSFD computation. Figure 4(b) represents the spatial distributions of the in situ electric field and current density-induced in the female model during the exposure to the combination of switched gradient coils. Figure 4(c) shows a comparison of the induced average current density versus superior-inferior axis obtained using the parallel BiCG and SOR.
Although the large-scale model used is very inhomogeneous and the source field varies dramatically in the spatial domain, both methods provided remarkably similar simulation results in terms of resulting field magnitudes and spatial distributions, with less than 1% relative difference. According to the convergence performance results detailed in Table 2, the QSFD-BiCG method clearly outperforms the SOR-based QSFD technique at all nominated resolutions of the female body model.
According to Table 2, the coefficient matrix dimension grows dramatically up to 8 million elements with 2 mm resolution, which brings about tremendous computational burdens. Via the computing parallelization, both SOR and BiCG methods have been improved, while the latter shows better convergence performance.
Using the 1 mm resolution male body model, the parallel QSFD-BiCG method took around 23 hours and 28 GB of RAM to evaluate in situ electric fields/currents. The number of elements in matrix A was ~. In contrast, the single processing equivalent was unable to perform this particular simulation due to immense memory requirements. This example demonstrates one of the clear advantages of the parallel QSFD computing platform over its conventional single processor-based counterpart. Figures 6(a) and 6(b) illustrate very fine details of induced electric field and current density distributions for the male body model, respectively. According to the simulation results, the largest electric fields and current densities are located in the frontal left region of the body, as this is the part of the body that is closest to the gradient coil conductors. With well-conditioned band diagonal sparse coefficient matrices, the proposed parallel QSFD-BiCG scheme demonstrates robust performance in the handling of low-frequency electromagnetic fields problems. The BiCG method also shows efficient memory usage, as it only consumes about 1/3 of memory compared to the standard SOR scheme.

(a)

(b)
This level of high-resolution analyses is imperative and significant for those interested in a variety of numerical problems such as but not limited to: thermodynamic modeling involving a fine vascular structure, cardiac electric field propagation, hyperthermia, eddy current modeling, occupational exposure of workers to electromagnetic fields, and so forth. Most of these and other related problems would benefit from employing the whole body in the computational process, as this represents a more realistic physical scenario.
4. Conclusion
Computationally intensive numerical software has become necessary to handle the increasing complexity of electromagnetic field problems in MRI, particularly at high-field strengths. In this work, high-performance finite-difference solvers architected within a parallel framework have been presented. The potential of the optimized parallel scheme has been demonstrated in typical MRI applications. The case studies indicate that the power of the software can enable straightforward adaptation to applications involving optimization of system components.
Acknowledgments
The financial support for this project from the Australian Research Council and The Health and Safety Executive (UK) (the project managed by MCL , UK) is gratefully acknowledged.