OpenMP Tutorial


Top | Next

1. Your First OpenMP Program

Let's start with an example. You can download the first file taylor.c or just look at it below. The idea is that we want to calculate the value e*π by first calculating e and π from their Taylor expansions and then multiply the two values together.

In short, we know that e is given by:

Taylor series 
image

and π is given by:

Taylor series 
image


     1: /*
     2:  * taylor.c
     3:  *
     4:  * This program calculates the value of e*pi by first calculating e
     5:  * and pi by their taylor expansions and then multiplying them
     6:  * together.
     7:  */
     8: 
     9: #include <stdio.h> 
    10: #include <time.h> 
    11: 
    12: #define num_steps 20000000 
    13: 
    14: int main(int argc, char *argv[])
    15: {
    16:   double start, stop; /* times of beginning and end of procedure */
    17:   double e, pi, factorial, product;
    18:   int i;
    19: 
    20:   /* start the timer */
    21:   start = clock();
    22: 
    23:   /* First we calculate e from its taylor expansion */
    24:   printf("e started\n");
    25:   e = 1;
    26:   factorial = 1; /* rather than recalculating the factorial from
    27:                     scratch each iteration we keep it in this variable
    28:                     and multiply it by i each iteration. */
    29:   for (i = 1; i<num_steps; i++) {
    30:     factorial *= i;
    31:     e += 1.0/factorial;
    32:   }
    33:   printf("e done\n");
    34:   
    35:   /* Then we calculate pi from its taylor expansion */
    36:   printf("pi started\n");
    37: 
    38:   pi = 0;
    39:   for (i = 0; i < num_steps*10; i++) {
    40:     /* we want 1/1 - 1/3 + 1/5 - 1/7 etc.
    41:        therefore we count by fours (0, 4, 8, 12...) and take
    42:          1/(0+1) =  1/1
    43:        - 1/(0+3) = -1/3
    44:          1/(4+1) =  1/5
    45:        - 1/(4+3) = -1/7 and so on */
    46:     pi += 1.0/(i*4.0 + 1.0);
    47:     pi -= 1.0/(i*4.0 + 3.0);
    48:   }
    49:   pi = pi * 4.0;
    50:   printf("pi done\n");
    51:   
    52:     product = e * pi;
    53: 
    54:   stop = clock();
    55: 
    56:   printf("Reached result %f in %.3f seconds\n", product, (stop-start)/1000);
    57: 
    58:   return 0;
    59: }
    60: 

If you say this looks just like any normal c program, you're right. An OpenMP c program is a normal c program. In a minute we will add a few lines of code to take advantage of OpenMP but first let's compile and run taylor.c.

Windows:
C:\openmp> icl /O2 taylor.c
Linux:
   ~/openmp> icl -O2 taylor.c

The compiler should spit out the normal text.
Next, try running the program.

   C:\openmp> taylor.exe
   e started
   e done
   pi started
   pi done
   Reached result 8.539734 in 18.468 seconds

If we look at the processor usage while running it we see the following:

screenshot of windows task manager as described below

The CPU usage is locked at 50%. This is because the program can only run on one core at a time. It may (as it does in this case) jump back and forth between the cores much faster than the task manager can update. However, half of the potential processing capabilities of the CPU are idle. Let's use OpenMP to fix that.

The file taylor_mp.c is a new version of taylor.c that can take advantage of some of OpenMP's features. The important differences are shown below.

    23:   /* Now there is no first and second, we calculate e and pi */
    24: #pragma 
omp parallel sections num_threads(2)
    25:   {
    26: #pragma omp section 
    27:     {
    28:       printf("e started\n");
    29:       e = 1;
    30:       factorial = 1; /* rather than recalculating the factorial from
    31:                         scratch each iteration we keep it in this variable
    32:                         and multiply it by i each iteration. */
    33:       for (i = 1; i<num_steps; i++) {
    34:         factorial *= i;
    35:         e += 1.0/factorial;
    36:       }
    37:       printf("e done\n");
    38:     } /* e section */
    39: 
    40: #pragma omp section  
    41:     {
    42:       /* In this thread we calculate pi expansion */
    43:       printf("pi started\n");
    44: 
    45:       pi = 0;
    46:       for (i = 0; i < num_steps*10; i++) {
    47:         /* we want 1/1 - 1/3 + 1/5 - 1/7 etc.
    48:            therefore we count by fours (0, 4, 8, 12...) and take
    49:              1/(0+1) =  1/1
    50:            - 1/(0+3) = -1/3
    51:              1/(4+1) =  1/5
    52:            - 1/(4+3) = -1/7 and so on */
    53:         pi += 1.0/(i*4.0 + 1.0);
    54:         pi -= 1.0/(i*4.0 + 3.0);
    55:       }
    56:       pi = pi * 4.0;
    57:       printf("pi done\n");
    58:     } /* pi section */
    59:     
    60:   } /* omp sections */
    61:   /* at this point the threads should rejoin */
    62: 
    63:   product = e * pi;

As intellegent programmers, the first thing we notice about our Taylor series program is that the values of e and π are calculated independently of each other and can therefore be done in parallel. We used the omp parallel sections directive to indicate this to the compiler.

All OpenMP directives take the form:

       #pragma omp directive [arguments]

Within the block of code following the sections construct two section constructs are defined. When the program reaches the sections directive it splits into two threads. The number of threads is given by the num_threads argument. We chose two because we have two sections to be evaluated in parallel and our single dual-core cpu is capable of running two threads simultaneously.

Each thread will run one section. When it has completed, it will wait for the other thread to complete its section and then the two threads merge into one at the end of the sections block and the remainder of the program is executed normally.

Now, let's try running it.

Windows:
   C:\openmp> icl /O2 /Qopenmp taylor_mp.c
Linux:
   ~/openmp> icl -O2 -openmp taylor_mp.c
Output:
   taylor_mp.c(24) : (col. 1) remark: OpenMP DEFINED SECTION WAS PARALLELIZED.

Note that this time the compiler indicates that it parallelized the section as requested
Next, try running the program.

   C:\openmp> taylor_mp.exe
   e started
   pi started
   pi done
   e done
   Reached result 8.539734 in 13.375 seconds

We have shaved a third of our running time!!

A closer look

Let's now look in more detail at what we just did. If we look at the task manager when running our parallel process (on right) with the old version of our program (on left), what do we see?

screenshot of windows task manager as described below screenshot of windows task manager running 
parallel process

On the right we are able to use both cores at 100%. A full run of the program is shown earlier on in the trace. Notice how the CPU usage drops off about half-way through? That is because the pi calculation take considerably less time than the e calculation.

   C:\openmp> taylor_mp.exe
   e started
   pi started
   pi done
   e done
   Reached result 8.539734 in 13.375 seconds

We can see this reflected in our output. The thread working on π finishes first but then has to wait for the other thread before continuing. The operation of the two programs is shown diagrammatically below.

Program flowchart

In the next lesson we will learn how to parallelize loops and learn a little about sharing data between threads.



Next: Loops and Shared Data