Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/MicrosoftDocs/cpp-docs/llms.txt

Use this file to discover all available pages before exploring further.

Visual C++ provides a rich set of technologies for writing multi-threaded and parallel programs. From the high-level Parallel Patterns Library (PPL) that ships as part of the Concurrency Runtime, to the standards-based OpenMP pragmas, to C++11 std::thread, to fully automatic compiler-driven parallelization — there is a solution for every level of control and portability requirement. Choosing the right approach depends on how much explicit control you need, whether your parallelism is data-parallel or task-parallel, and whether portability to other compilers matters.

Parallelism Technology Comparison

Concurrency Runtime (PPL)

Best for: Task parallelism, parallel loops over collections, concurrent data structures. Part of the MSVC toolchain; no external dependencies.

OpenMP

Best for: Loop-level data parallelism in numerical code. Standards-based (OpenMP 2.0), portable to GCC and Clang. Enabled with /openmp.

std::thread / std::async

Best for: Portable, explicit thread management when you need fine-grained control without MSVC-specific APIs. Part of the C++11 Standard Library.

Auto-Parallelization (/Qpar)

Best for: Automatically parallelizing simple loops without code changes. Conservative — only safe loops are parallelized.

Concurrency Runtime and Parallel Patterns Library (PPL)

The Concurrency Runtime (ConcRT) is a concurrency framework built into MSVC that provides cooperative task scheduling, parallel algorithms, and concurrent containers. The Parallel Patterns Library (PPL) is its user-facing API layer.
#include <ppl.h>
#include <concurrent_vector.h>
#include <iostream>
using namespace concurrency;

int main() {
    // parallel_for: divide loop iterations across available cores
    parallel_for(0, 10, [](int i) {
        printf("parallel_for iteration %d on thread %u\n",
               i, GetCurrentThreadId());
    });

    // parallel_for_each: iterate over a container in parallel
    std::vector<int> data = {1, 2, 3, 4, 5, 6, 7, 8};
    concurrent_vector<int> results;

    parallel_for_each(data.begin(), data.end(), [&](int x) {
        results.push_back(x * x);  // concurrent_vector is thread-safe
    });

    printf("Computed %zu results\n", results.size());
    return 0;
}
The PPL is available automatically when you include <ppl.h>. No additional project settings are required beyond the default MSVC toolchain.

Task-Based Parallelism

#include <ppltasks.h>
#include <iostream>
using namespace concurrency;

int main() {
    // Create asynchronous tasks
    auto t1 = create_task([]() -> int {
        // Runs on thread pool
        return 42;
    });

    auto t2 = create_task([]() -> int {
        return 58;
    });

    // Compose tasks: run after both complete
    auto combined = (t1 && t2).then([](std::vector<int> results) {
        return results[0] + results[1];
    });

    printf("Result: %d\n", combined.get()); // Blocks until done
    return 0;
}

OpenMP

OpenMP provides compiler-directive-based parallelism using #pragma omp pragmas. It is the standard choice for scientific computing and numerical loops where portability across compilers is important.
#include <omp.h>
#include <stdio.h>

int main() {
    const int N = 1000;
    double sum = 0.0;

    // Parallel loop with reduction
    #pragma omp parallel for reduction(+:sum) schedule(static)
    for (int i = 0; i < N; i++) {
        sum += (double)i;
    }

    printf("Sum = %.0f (expected %.0f)\n", sum, (double)N*(N-1)/2);
    printf("Threads used: %d\n", omp_get_max_threads());
    return 0;
}
Enable OpenMP in Visual Studio: Project Properties → C/C++ → Language → OpenMP Support → Yes (/openmp)

std::thread and std::async (C++ Standard Library)

For maximum portability or when you need explicit thread lifecycle management, use C++11 standard library threading:
#include <thread>
#include <vector>
#include <iostream>
#include <mutex>

std::mutex print_mutex;

void worker(int id, int iterations) {
    double result = 0.0;
    for (int i = 0; i < iterations; i++)
        result += i * 0.001;

    std::lock_guard<std::mutex> lock(print_mutex);
    std::cout << "Thread " << id << " result: " << result << "\n";
}

int main() {
    std::vector<std::thread> threads;
    for (int i = 0; i < 4; i++)
        threads.emplace_back(worker, i, 10000);

    for (auto& t : threads)
        t.join();  // Wait for all threads

    return 0;
}

Auto-Parallelization (/Qpar)

The MSVC auto-parallelizer analyzes loops at compile time and automatically generates multi-threaded code for loops it determines are safe to parallelize. No source changes are required.
// Compile with: cl /O2 /Qpar /Qpar-report:2 myfile.cpp
void vector_multiply(float* A, float* B, float* C, int N) {
    // The compiler may auto-parallelize this loop under /Qpar
    for (int i = 0; i < N; i++) {
        C[i] = A[i] * B[i];
    }
}

// Hint the compiler to parallelize with a specific thread count:
void hinted_loop(float* A, float* B, int N) {
    #pragma loop(hint_parallel(8))
    for (int i = 0; i < N; i++) {
        A[i] = A[i] + B[i];
    }
}
Use /Qpar-report:1 to see which loops were parallelized, or /Qpar-report:2 to see why loops were not parallelized.

Auto-Vectorization (/O2)

Auto-vectorization is enabled automatically at /O2 and higher. The compiler generates SIMD instructions (SSE, AVX) for eligible loops. Use /arch:AVX2 to unlock 256-bit SIMD operations.
// Compile with: cl /O2 /arch:AVX2 myfile.cpp
void add_arrays(float* __restrict a, float* __restrict b,
                float* __restrict c, int n) {
    // Auto-vectorized: compiler generates _mm256_add_ps (8 floats per cycle)
    for (int i = 0; i < n; i++) {
        c[i] = a[i] + b[i];
    }
}

// Confirm with: cl /O2 /arch:AVX2 /Qvec-report:2 myfile.cpp

When to Choose Each Approach

ScenarioRecommended Approach
Parallel algorithms over collectionsPPL (parallel_for, parallel_for_each)
Asynchronous background tasks, continuationsPPL tasks / std::async
Scientific loop parallelism, cross-compiler portabilityOpenMP (/openmp)
Explicit thread management, custom synchronizationstd::thread
Zero-code-change performance on eligible loopsAuto-parallelization (/Qpar)
SIMD acceleration on numeric arraysAuto-vectorization (/O2 /arch:AVX2)
GPU general-purpose computing (VS 2019 and earlier)C++ AMP (deprecated in VS 2022+)
Do not mix OpenMP and PPL task parallelism in the same region of code — they use separate thread schedulers and can lead to oversubscription (more threads than logical cores), which reduces performance.

Build docs developers (and LLMs) love