Concurrency Runtime and PPL: Tasks and Parallel Loops

The Concurrency Runtime (ConcRT) is a cooperative concurrency framework built into the MSVC toolchain that provides high-level abstractions for parallel and asynchronous programming. It consists of three layers: the Parallel Patterns Library (PPL) for parallel algorithms and task parallelism, the Asynchronous Agents Library for dataflow and message-passing concurrency, and the Task Scheduler for fine-grained scheduling control. Unlike raw thread-based programming, ConcRT’s cooperative scheduler avoids oversubscription by adapting thread usage to available hardware cores and work volume.

The Concurrency Runtime relies on C++11 features including lambdas, auto, range-based for, and move semantics. Ensure your project targets C++11 or later (Project Properties → C/C++ → Language → C++ Language Standard → ISO C++14 or higher).

Parallel Loops

`parallel_for`

parallel_for divides a loop iteration range across available cores. The range must be forward-iterable with a fixed step.

#include <ppl.h>
#include <vector>
#include <cmath>
#include <iostream>

using namespace concurrency;

int main() {
    const int N = 1000000;
    std::vector<double> results(N);

    // Divide [0, N) across CPU cores automatically
    parallel_for(0, N, [&](int i) {
        results[i] = std::sqrt(static_cast<double>(i));
    });

    // Partitioned variant: process chunks instead of individual elements
    // (reduces scheduler overhead for fine-grained work)
    parallel_for(0, N, 1000, [&](int start) {
        for (int i = start; i < std::min(start + 1000, N); i++)
            results[i] = std::sin(results[i]);
    });

    std::cout << "results[500000] = " << results[500000] << "\n";
    return 0;
}

`parallel_for_each`

parallel_for_each works on any range defined by iterators, including standard containers:

#include <ppl.h>
#include <concurrent_vector.h>
#include <vector>
#include <string>
#include <algorithm>

using namespace concurrency;

struct Record { int id; double value; };

int main() {
    std::vector<Record> records(10000);
    for (int i = 0; i < 10000; i++)
        records[i] = {i, static_cast<double>(i) * 1.5};

    concurrent_vector<double> processed;

    parallel_for_each(records.begin(), records.end(), [&](const Record& r) {
        // Work done in parallel — concurrent_vector is thread-safe for push_back
        processed.push_back(r.value * r.value);
    });

    printf("Processed %zu records\n", processed.size());
    return 0;
}

`parallel_invoke`

parallel_invoke runs a fixed number of function objects concurrently and waits for all to complete. It is ideal for divide-and-conquer algorithms.

#include <ppl.h>
#include <vector>
#include <iostream>

using namespace concurrency;

// Parallel merge sort using parallel_invoke
void parallel_mergesort(std::vector<int>& v, int left, int right) {
    if (right - left < 256) {
        // Base case: sort small segments serially
        std::sort(v.begin() + left, v.begin() + right);
        return;
    }

    int mid = left + (right - left) / 2;

    // Sort both halves in parallel
    parallel_invoke(
        [&]() { parallel_mergesort(v, left, mid); },
        [&]() { parallel_mergesort(v, mid, right); }
    );

    // Merge the sorted halves
    std::inplace_merge(v.begin() + left, v.begin() + mid, v.begin() + right);
}

int main() {
    std::vector<int> data = {9, 3, 7, 1, 5, 8, 2, 6, 4, 0};
    parallel_mergesort(data, 0, static_cast<int>(data.size()));

    for (int x : data) std::cout << x << " ";
    std::cout << "\n";
    return 0;
}

Task-Based Parallelism

`task` and `create_task`

Tasks represent asynchronous operations. They are lightweight, composable, and run on the ConcRT thread pool.

#include <ppltasks.h>
#include <iostream>
#include <string>

using namespace concurrency;

// Simulate fetching data asynchronously
task<std::string> fetch_data(int id) {
    return create_task([id]() -> std::string {
        // Simulate work
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
        return "Data for id=" + std::to_string(id);
    });
}

int main() {
    // Chain tasks with .then() continuations
    auto pipeline = fetch_data(42)
        .then([](std::string raw) {
            // Transform the data (runs after fetch completes)
            return "Processed: " + raw;
        })
        .then([](std::string processed) {
            // Display result
            std::cout << processed << "\n";
            return processed.length();
        });

    // Block until the entire pipeline is done
    size_t len = pipeline.get();
    std::cout << "Output length: " << len << "\n";

    // Run two tasks concurrently and collect both results
    auto t1 = create_task([]() { return 10; });
    auto t2 = create_task([]() { return 32; });

    (t1 && t2).then([](std::vector<int> results) {
        std::cout << "Sum: " << results[0] + results[1] << "\n";
    }).get();

    return 0;
}

`task_group`

task_group allows dynamic task creation where the number of tasks is not known at compile time. It provides explicit wait() and cancel() support.

#include <ppl.h>
#include <vector>
#include <iostream>

using namespace concurrency;

int main() {
    task_group tg;
    std::vector<int> inputs = {1, 2, 3, 4, 5, 6, 7, 8};

    // Submit tasks dynamically
    for (int x : inputs) {
        tg.run([x]() {
            // Each task processes one input
            printf("Processing %d → %d\n", x, x * x);
        });
    }

    // Wait for all submitted tasks to finish
    tg.wait();
    std::cout << "All tasks done.\n";

    return 0;
}

Prefer task_group over task when you need explicit cancellation via structured_task_group::cancel(). Use task (from <ppltasks.h>) when you need continuations (.then()) and composability.

Concurrent Containers

The PPL provides thread-safe containers that allow concurrent reads and writes without external locking:

concurrent_vector

concurrent_vector<T> allows concurrent push_back and element access. Elements are never moved once inserted, so existing iterators/pointers remain valid.

#include <concurrent_vector.h>
#include <ppl.h>

using namespace concurrency;

int main() {
    concurrent_vector<int> cv;

    // Safe to call push_back from multiple threads simultaneously
    parallel_for(0, 1000, [&](int i) {
        cv.push_back(i * i);
    });

    printf("Vector size: %zu\n", cv.size());
    return 0;
}

concurrent_queue

concurrent_queue<T> provides thread-safe push and try_pop for producer-consumer patterns.

#include <concurrent_queue.h>
#include <thread>
#include <iostream>

using namespace concurrency;

int main() {
    concurrent_queue<int> queue;

    // Producer thread
    std::thread producer([&]() {
        for (int i = 0; i < 100; i++)
            queue.push(i);
    });

    // Consumer thread
    std::thread consumer([&]() {
        int item;
        int received = 0;
        while (received < 100) {
            if (queue.try_pop(item)) {
                printf("Consumed: %d\n", item);
                received++;
            }
        }
    });

    producer.join();
    consumer.join();
    return 0;
}

concurrent_unordered_map

concurrent_unordered_map<K, V> supports concurrent insert, find, and iteration. It does not support concurrent erase.

#include <concurrent_unordered_map.h>
#include <ppl.h>
#include <string>

using namespace concurrency;

int main() {
    concurrent_unordered_map<std::string, int> word_count;

    std::vector<std::string> words = {
        "apple", "banana", "apple", "cherry", "banana", "apple"
    };

    parallel_for_each(words.begin(), words.end(), [&](const std::string& w) {
        word_count[w]++;  // Thread-safe increment
    });

    for (auto& [word, count] : word_count)
        printf("%s: %d\n", word.c_str(), count);

    return 0;
}

Asynchronous Agents Library

The Asynchronous Agents Library enables dataflow programming using message-passing between agents. Agents communicate through message blocks (unbounded_buffer, call, transformer) without sharing state.

#include <agents.h>
#include <iostream>
#include <string>

using namespace concurrency;

// A simple pipeline: producer → transformer → consumer
int main() {
    // Result buffer receives output from the transformer
    unbounded_buffer<std::string> result_buffer;

    // Transformer: converts int → string representation.
    // The optional second constructor arg is the downstream target.
    transformer<int, std::string> convert_agent(
        [](int n) -> std::string {
            return "Item: " + std::to_string(n * n);
        },
        &result_buffer   // forward transformed output here
    );

    // Producer: send 5 items directly into the transformer
    for (int i = 1; i <= 5; i++) {
        send(convert_agent, i);
    }

    // Consumer: receive and display results
    for (int i = 0; i < 5; i++) {
        std::string result = receive(result_buffer);
        std::cout << result << "\n";
    }

    return 0;
}

Cancellation

PPL tasks and parallel loops support cooperative cancellation through cancellation_token:

#include <ppltasks.h>
#include <iostream>

using namespace concurrency;

int main() {
    cancellation_token_source cts;
    cancellation_token token = cts.get_token();

    auto long_task = create_task([token]() {
        for (int i = 0; i < 1000000; i++) {
            // Check for cancellation inside the loop
            if (token.is_canceled()) {
                cancel_current_task();  // Cooperative cancellation
            }
            // Simulate work
            volatile double x = std::sqrt(static_cast<double>(i));
            (void)x;
        }
        return 42;
    }, token);

    // Cancel after a short delay
    std::this_thread::sleep_for(std::chrono::milliseconds(10));
    cts.cancel();

    try {
        long_task.get();
    } catch (const task_canceled&) {
        std::cout << "Task was canceled.\n";
    }

    return 0;
}

Runtime Library

MFC & ATL

Parallel Programming

Intrinsics & Portability

Concurrency Runtime and PPL: Tasks and Parallel Loops

Parallel Loops

`parallel_for`

`parallel_for_each`

`parallel_invoke`

Task-Based Parallelism

`task` and `create_task`

`task_group`

Concurrent Containers

Asynchronous Agents Library

Cancellation

Build docs developers (and LLMs) love

Runtime Library

MFC & ATL

Parallel Programming

Intrinsics & Portability

Documentation Index

​Parallel Loops

​parallel_for

​parallel_for_each

​parallel_invoke

​Task-Based Parallelism

​task and create_task

​task_group

​Concurrent Containers

​Asynchronous Agents Library

​Cancellation

Build docs developers (and LLMs) love

Parallel Loops

`parallel_for`

`parallel_for_each`

`parallel_invoke`

Task-Based Parallelism

`task` and `create_task`

`task_group`

Concurrent Containers

Asynchronous Agents Library

Cancellation