OpenMP Tutorial
Previous | Top
4. Putting It All Together
We now consider a more complicated example. I am simply going to copy the entire code of the example (combined_mp.c) to this page. The non-OpenMP version (combined.c) is identical only without the omp declarations.
1: /*
2: * combined.c
3: *
4: * This program combines what we saw before. It calculates e and pi
5: * and then integrates the x^2. We also print out the elapsed time in
6: * ms at several points in our program. We have replaced the function y=x^2
7: * with a more complex polynomial 3x^3 + 2x^2 + x.
8: */
9:
10: #include <stdio.h>
11: #include <time.h>
12:
13: #define num_steps 10000000 /* steps to use in Taylor expansions */
14: #define int_steps (1<<30) /* steps to use in integration */
15:
16: int main(int argc, char *argv[])
17: {
18: double start, stop; /* times of beginning and end of procedure */
19:
20: /* Values for part 1 */
21: double e, pi, factorial, product;
22: int i;
23:
24: /* Values for part 2 */
25: double sum;
26: double x;
27:
28: /* start the timer */
29: start = clock();
30:
31: #pragma omp parallel reduction(+: sum)
32: {
33: #pragma omp sections nowait
34: {
35: #pragma omp section
36: {
37: /* First we calculate e from its Taylor expansion */
38: printf("e started at %.0f\n", clock()-start);
39: e = 1;
40: factorial = 1;
41: for (i = 1; i<num_steps; i++) {
42: factorial *= i;
43: e += 1.0/factorial;
44: }
45: printf("e done at %.0f\n", clock()-start);
46: }
47: #pragma omp section
48: {
49: /* Then we calculate pi from its Taylor expansion */
50: printf("pi started at %.0f\n", clock()-start);
51:
52: pi = 0;
53: for (i = 0; i < num_steps*20; i++) {
54: pi += 1.0/(i*4.0 + 1.0);
55: pi -= 1.0/(i*4.0 + 3.0);
56: }
57: pi = pi * 4.0;
58: printf("pi done at %.0f\n", clock()-start);
59: }
60: } /* sections */
61:
62: /* Now we integrate the function */
63: printf("integration started at %.0f\n", clock()-start);
64: sum = 0;
65: #pragma omp for nowait
66: for (i = 0; i<int_steps; i++) {
67: x = 2.0 * (double)i / (double)(int_steps); /* value of x */
68: sum += ( 3*x*x*x + 2*x*x + x ) / int_steps;
69: }
70:
71: #pragma omp single /* we only need to print this once */
72: printf("integration done at %.0f\n", clock()-start);
73:
74:
75: #pragma omp barrier
76: /* make sure all threads are caught up before we do the multiplication */
77: product = e * pi;
78:
79: } /* omp parallel */
80:
81: /* we're done so stop the timer */
82: stop = clock();
83:
84: printf("Values: e*pi = %f, integral = %f\n", product, sum);
85: printf("Total elapsed time: %.3f seconds\n", (stop-start)/1000);
86:
87: return 0;
88: }
As you can see, this is very similar to what we saw before. In fact it is basically a combination of the Taylor and integrate demo programs. The major difference is that whereas before we used parallel sections and parallel for constructs, here we use separated parallel, sections, and for constructs.
Let's run the two versions of the program and then go through a thorough discussion.
C:\openmp>combined e started at 0 e done at 6687 pi started at 6687 pi done at 11781 integration started at 11781 integration done at 20921 Values: e*pi = 8.539734, integral = 9.666667 Total elapsed time: 20.921 seconds C:\openmp>combined_mp e started at 0 pi started at 0 pi done at 5125 integration started at 5125 e done at 6672 integration started at 6672 integration done at 10297 Values: e*pi = 8.539734, integral = 9.666667 Total elapsed time: 11.860 seconds
Ok, here's what's happening. In the non-parallel case a single thread is run. First it spends ~6.7 seconds calculating e, then ~5 seconds calculating pi, then it goes on to calculate the integral (which is now more complicated than before). The integration takes about 9 seconds for a total of around 21 seconds.
In the parallel case things are (as expected) a lot faster and a little more complicated. We will walk through line-by-line. On line 31, the program forks into two threads due to the parallel declaration. We also declare our data reduction on the variable sum. Note that this time, the parallel declaration was not bound to any specific construct. The first construct it encounters is the sections construct on line 33. Like before the two threads will diverge and one will do each of the sections. We also declared the sections construct with the argument nowait. Before, when each thread was working on a section in parallel, the thread that completed first would wait until the other thread was finished too before continuing. The nowait argument says, "whoever finishes first should just go on to the next thing." In this case, the next thing is to print out that the thread is starting the integration and then begin the for loop. We have declared this a parallel for loop, so the first thread here will begin executing the iterations of the loop in no particular order. When the second thread gets here (line 65) it will join in until all of the iterations have been complete. Then both threads will proceed.
The declaration omp single before line 72 says that only one thread should execute this printf Whichever thread gets there first does it and the rest skip.
The omp barrier directive on line 75 make all threads wait at this point until all threads have caught up before continuing. This is very important here because e and pi are both required here and they may have been calculated in different threads. You have already been using barrier you just didn't know it. There is an implied barrier at the end of every OpenMP for loop and OpenMP sections declaration. In fact, we used the nowait argument to remove the implied barrier above. Finally at line 79 the two threads merge and we resume serial processing.
Now look at the output from executing this program. The calculations of e and π both start simultaneously. After about 5 seconds, the thread calculating π is finished and it begins work on the integral. 1.7 seconds later (6.7 seconds total), the other thread is done calculating e and joins the first in evaluating the integral. The integral is completed at ~10.3 seconds having been worked on for ~4 seconds by one thread and ~5 seconds by the other -- recall that in the serial program the integral took 9 seconds total. The total running time for the program is 11.8 seconds, which is very close to half that of the serial program.
Congratulations!! You are now a parallel programmer. If you want to continue learning about OpenMP including the runtime libraries, you can read the full specification at openmp.org.
|
|
Copyright 2006 kallipolis.com |