论文部分内容阅读
1. Complesso Universitario M.S. Angelo Via Cintia, University of Naples Federico II, 80126 Naples, Italy
2. Department of Chemistry, National Research Council, Institute of Chemistry and Technology of Polymers, Naples, Italy
3. Centro Direzionale, University of Naples Parthenope, Naples 80143, Italy
Received: March 25, 2011 / Accepted: April 08, 2011 / Published: February 15, 2012.
Abstract: We describe results carried out using the parallel software PAR TFEM developed to address high intensive numerical simulations of the dynamic of three dimensional viscoelastic fluids. The software relies on PETSc (Portable, Extensible Toolkit for Scientific Computation) components integrated with the finite element solver (TFEM), already employed by chemical engineers for addressing bi dimensional simulations. Parallelization approach is based on domain decomposition. Performance analysis is carried out by using the normalized speedup on two simulation case studies.
In the last three decades, much effort has been devoted to develop accurate and robust algorithms for simulating viscolastic flows at relatively low measures of the fluid elasticity (High Weissenberg Number Problem (HWNP)). In this regard, several techniques have been proposed in order to stabilize the momentum balance [1-3] and the constitutive equation [4-6]. The progress made in understanding the HWNP1 and finding “ad hoc” solutions together to the development of efficient numerical strategies allowed to model phenomena which accurately predict experimental observations [7].
Nowadays, no relevant difficulty seems to arise in 2D (or 3D axial symmetric) problems. Anyhow, predictions of visco elastic flows in 3D geometries are mandatory in those processes where a peculiar phenomenology occurs in the third dimension, e.g., particle chaining [8] or microfluidic flows [9]. Concerning for example the micro fluid area,“computational analysis” of the steady three-dimensional flow fields that are typically present in micro fluid devices is also desirable. To date almost all such numerical studies have been performed with Newtonian constitutive models only [10] and this“presents a golden opportunity for computational rheologists” [10].
The exponential growth of computational resources plays a crucial role towards the employment of numerical algorithms into useful simulations. A simulation that yields high-fidelity results is of little use if it is too expensive to run or if it cannot be scaled up to the resolutions, simulation times, or ensemble sizes is required to describe the real-world phenomena of interest. To do this it is needed to cast numerical algorithms into high performance, scalable and easy-to-use software tools. In this paper we describe the development of a software environment finalized to simulate the dynamic of three dimensional viscoelastic fluids exploiting the computational power of high
performance computing environments. The computing environment that supports the simulation software relies on PETSc (Portable, Extensible Toolkit for Scientific Computation [11]) components integrated with a finite element solver (TFEM) [12], employed to discretize and solve numerical problems deriving from partial differential equation models. We refer to this software tool as to PAR TFEM (release 1.0).
The paper is organized as follows: in Section 2 we describe the governing equations together with the numerical model and the corresponding algorithm. Section 3 discusses the numerical algorithm while the integration of TFEM modules with PETSc data structures and the introduction of concurrency along discretization and solution steps are described in Section 4. Three test cases are described in section 5 in order to validate the numerical results and highlight the limitations and performance improvements by means of the parallelized code. Tests are carried out on a multiprocessor parallel computer. Conclusions, provided in section 6, conclude the work.
steps 1, 5, 7 and PETSc routines at steps 2, 6, 8. Algorithm 2 describes the parallel execution of steps 1, 5 and 7, while Algorithm 3 describes the final version of the parallel algorithm based on PAR TFEM and PETSc.
It is worth to note that in order to gain efficiency of data allocation and to permit the management of very large data structures we pay attention to use PETSc preallocation mechanisms during matrices definition: PETSc sparse matrices are dynamic data structures to which freely add additional non zeros, but dynamically add many non zeros requires additional memory allocations and this degrades the software performance. Memory pre-allocation mechanism provides the freedom of dynamic data structures plus good performance: it performs the memory allocation needed to matrix representation on the basis of an estimate of the number of nonzero elements.
3 Performance tests are carried out on a multi processor system with the following hardware configuration: a blade-based multiprocessor system composed by 304 blades Dell PowerEdge M600 each with two quad core Intel Xeon E5410@2.33GHz processors, 8 Gb of RAM (on some blades composing the cluster 16 GB of RAM memory are available), a Mellanox Technologies MT25418 Inˉniband card a Cisco switch for internal Infiniband connectivity.
boundaries along the x and z-direction. Notice that the particle is freely rotating around the vorticity axis while, due to the symmetry, no translational motion occurs. However, in order to test the accuracy of the solution, we let the particle freely rotate and translate, i.e. all the six additional unknowns (three translational and three angular velocities) are added in the solution vector. Of course, five of them are expected to be zero within the numerical error.
The linear system is again solved by MUMPS. An important remark is in order. Since the particle does not move, the position in the fluid grid of the finite elements representing the discretized spherical surface does not change in time. Therefore, we could compute the LU factorization just once and keep it in the whole time-stepping procedure. However, the absence of translation motion is a direct consequence of the specific problem under investigation and does not hold in general. Hence, to evaluate the performances of the parallel solver in the general case, we compute the LU factorization step by step.
Domain discretization is performed using two meshes:
(1) medium-size mesh: 40 × 8 × 40 elements to discretize the domain representing the viscoelastic fluid, and 62 triangular elements to discretize the spherical surface;
degradation as the processors number increases. In general, it is known that the achievable performance of applications with a fixed size depends on deep interactions among processors, memory system and interconnected network. This means that, due to the limited amount of available parallelism, there exists an“optimal” configuration of the parallel machine and each additional processor contributes slightly less or do not.
2. Department of Chemistry, National Research Council, Institute of Chemistry and Technology of Polymers, Naples, Italy
3. Centro Direzionale, University of Naples Parthenope, Naples 80143, Italy
Received: March 25, 2011 / Accepted: April 08, 2011 / Published: February 15, 2012.
Abstract: We describe results carried out using the parallel software PAR TFEM developed to address high intensive numerical simulations of the dynamic of three dimensional viscoelastic fluids. The software relies on PETSc (Portable, Extensible Toolkit for Scientific Computation) components integrated with the finite element solver (TFEM), already employed by chemical engineers for addressing bi dimensional simulations. Parallelization approach is based on domain decomposition. Performance analysis is carried out by using the normalized speedup on two simulation case studies.
In the last three decades, much effort has been devoted to develop accurate and robust algorithms for simulating viscolastic flows at relatively low measures of the fluid elasticity (High Weissenberg Number Problem (HWNP)). In this regard, several techniques have been proposed in order to stabilize the momentum balance [1-3] and the constitutive equation [4-6]. The progress made in understanding the HWNP1 and finding “ad hoc” solutions together to the development of efficient numerical strategies allowed to model phenomena which accurately predict experimental observations [7].
Nowadays, no relevant difficulty seems to arise in 2D (or 3D axial symmetric) problems. Anyhow, predictions of visco elastic flows in 3D geometries are mandatory in those processes where a peculiar phenomenology occurs in the third dimension, e.g., particle chaining [8] or microfluidic flows [9]. Concerning for example the micro fluid area,“computational analysis” of the steady three-dimensional flow fields that are typically present in micro fluid devices is also desirable. To date almost all such numerical studies have been performed with Newtonian constitutive models only [10] and this“presents a golden opportunity for computational rheologists” [10].
The exponential growth of computational resources plays a crucial role towards the employment of numerical algorithms into useful simulations. A simulation that yields high-fidelity results is of little use if it is too expensive to run or if it cannot be scaled up to the resolutions, simulation times, or ensemble sizes is required to describe the real-world phenomena of interest. To do this it is needed to cast numerical algorithms into high performance, scalable and easy-to-use software tools. In this paper we describe the development of a software environment finalized to simulate the dynamic of three dimensional viscoelastic fluids exploiting the computational power of high
performance computing environments. The computing environment that supports the simulation software relies on PETSc (Portable, Extensible Toolkit for Scientific Computation [11]) components integrated with a finite element solver (TFEM) [12], employed to discretize and solve numerical problems deriving from partial differential equation models. We refer to this software tool as to PAR TFEM (release 1.0).
The paper is organized as follows: in Section 2 we describe the governing equations together with the numerical model and the corresponding algorithm. Section 3 discusses the numerical algorithm while the integration of TFEM modules with PETSc data structures and the introduction of concurrency along discretization and solution steps are described in Section 4. Three test cases are described in section 5 in order to validate the numerical results and highlight the limitations and performance improvements by means of the parallelized code. Tests are carried out on a multiprocessor parallel computer. Conclusions, provided in section 6, conclude the work.
steps 1, 5, 7 and PETSc routines at steps 2, 6, 8. Algorithm 2 describes the parallel execution of steps 1, 5 and 7, while Algorithm 3 describes the final version of the parallel algorithm based on PAR TFEM and PETSc.
It is worth to note that in order to gain efficiency of data allocation and to permit the management of very large data structures we pay attention to use PETSc preallocation mechanisms during matrices definition: PETSc sparse matrices are dynamic data structures to which freely add additional non zeros, but dynamically add many non zeros requires additional memory allocations and this degrades the software performance. Memory pre-allocation mechanism provides the freedom of dynamic data structures plus good performance: it performs the memory allocation needed to matrix representation on the basis of an estimate of the number of nonzero elements.
3 Performance tests are carried out on a multi processor system with the following hardware configuration: a blade-based multiprocessor system composed by 304 blades Dell PowerEdge M600 each with two quad core Intel Xeon E5410@2.33GHz processors, 8 Gb of RAM (on some blades composing the cluster 16 GB of RAM memory are available), a Mellanox Technologies MT25418 Inˉniband card a Cisco switch for internal Infiniband connectivity.
boundaries along the x and z-direction. Notice that the particle is freely rotating around the vorticity axis while, due to the symmetry, no translational motion occurs. However, in order to test the accuracy of the solution, we let the particle freely rotate and translate, i.e. all the six additional unknowns (three translational and three angular velocities) are added in the solution vector. Of course, five of them are expected to be zero within the numerical error.
The linear system is again solved by MUMPS. An important remark is in order. Since the particle does not move, the position in the fluid grid of the finite elements representing the discretized spherical surface does not change in time. Therefore, we could compute the LU factorization just once and keep it in the whole time-stepping procedure. However, the absence of translation motion is a direct consequence of the specific problem under investigation and does not hold in general. Hence, to evaluate the performances of the parallel solver in the general case, we compute the LU factorization step by step.
Domain discretization is performed using two meshes:
(1) medium-size mesh: 40 × 8 × 40 elements to discretize the domain representing the viscoelastic fluid, and 62 triangular elements to discretize the spherical surface;
degradation as the processors number increases. In general, it is known that the achievable performance of applications with a fixed size depends on deep interactions among processors, memory system and interconnected network. This means that, due to the limited amount of available parallelism, there exists an“optimal” configuration of the parallel machine and each additional processor contributes slightly less or do not.