url: https://dazuozcy.github.io/posts/introdution-to-openmp-intel/#23-%E5%8F%AF%E6%80%95%E7%9A%84%E4%B8%9C%E8%A5%BF%E5%86%85%E5%AD%98%E6%A8%A1%E5%9E%8Batomicsflushpairwise%E5%90%8C%E6%AD%A5%20
新手并行程序员与专家并行程序员之间的区别是专家have a collection of these fundamental design patterns in their minds.
SPMD
Single Program Multiple Data: Run the same program on P processing elements where P can be arbitrarily large.
Use the rank - an ID ranging from 0 to (P-1) - to select between a set of tasks and to manage any shared data structures.
例子:
#include "omp.h"void main()
{int i, step;double pi = 0.0, sum = 0.0;step = 1.0 / (double)num_steps;#pragma omp parallel firstprivate(sum) private(x, i){int id = omp_get_threads_num();int numprocs = omp_get_num_threads();int step1 = id * num_steps / numprocs;int stepN = (id+1) * num_steps / numprocs;if (id == numprocs - 1) {stepN = num_steps;}for (ri = step1; i < stepN; i++) {x = (i+0.5)*step;sum += 4.0/(1.0+x*x);}#pragma omp criticalpi += sum*step;}
}
Loop Parallelism
Collections of tasks are defined as iterations of one or more loops.
Loop iterations are divided between a collection of processing elements to compute tasks in parallel.
例子:
void calc_pi_reduction()
{static long num_steps = 0x20000000;double step;double sum = 0.0;step = 1.0 / (double)num_steps;double start = omp_get_wtime( );
#pragma omp parallel
#pragma omp for reduction(+:sum)for (long i = 0; i < num_steps; i++) {double x = (i + 0.5) * step;sum += 4.0 / (1.0 + x * x);} double pi = sum * step;double end = omp_get_wtime( );printf("pi: %.16g in %.16g secs\n", pi, end - start);
}
分治Pattern
- 当问题存在分解成子问题的方法和将子问题的解重新组合为全局解的方法时使用。
- 定义一个分解操作。
- 持续分解直到子问题小到可以直接求解。
- 将子问题的解组合起来解决原始的全局问题。
例子:
#include "omp.h"static long num_steps = 100000000;
#define MIN_BLK 10000000
double pi_comp(int Nstart, int Nfinish, double step)
{int i, iblk;double x, sum = 0.0, sum1, sum2;if(Nfinish - Nstart < MIN_BLK) {for(i = Nstart; i < Nfinish; i++) {x = (i+0.5)*step;sum += 4.0/(1.0+x*x); }} else {iblk = Nfinish - Nstart;#pragma omp task shared(sum1)sum1 = pi_comp(Nstart, Nfinish - iblk/2, step);#pragma omp task shared(sum2)sum2 = pi_comp(Nfinish - iblk/2, Nfinish, step);#pragma omp taskwaitsum = sum1 + sum2;}return sum;
}int main()
{int i;double step, pi, sum;step = 1.0 / (double)num_steps;#pragma omp parallel{#pragma omp singlesum = pi_comp(0, num_steps, step);}pi = step * sum;return 0;
}