Do you want your ad here?

Contact us to get your ad seen by thousands of users every day!

[email protected]

How to send prompts in bulk with Spring AI and Java Virtual Threads

  • May 21, 2025
  • 4356 Unique Views
  • 3 min read
Table of Contents
Here’s the flow:Virtual Threads for Massive ParallelismSpring AI Prompt CallProcessing in BatchesHandling Errors GracefullyProcess Results in BulkFull ImplementationStay curious!

TL;DR: You’re building an AI-powered app that needs to send lots of prompts to OpenAI.

Instead of sending them one by one, you want to do it in bulk — efficiently and safely.

This is how you can use Spring AI with Java Virtual Threads to process hundreds of prompts in parallel.

When calling LLM APIs like OpenAI, you’re dealing with a high-latency, network-bound task. Normally, doing that in a loop slows you down and blocks threads. But with Spring AI and Java 21 Virtual Threads, you can fire off hundreds of requests in parallel without killing your app.

This is particularly useful when you want the LLM to perform actions such as summarizing or extracting information from lots of documents.

Here’s the flow:

  1. Get your list of text inputs.
  2. Filter the ones that need processing.
  3. Split them into batches.
  4. For each batch:
    — Use Virtual Threads to make OpenAI calls in parallel
    — Wait for all calls to finish (using CompletableFuture)
    — Save the results

Virtual Threads for Massive Parallelism

Java Virtual Threads are perfect for this. They’re lightweight, run on the JVM, and don’t block OS threads. Ideal for I/O-heavy operations like talking to APIs.

ExecutorService executorService = Executors.newVirtualThreadPerTaskExecutor()

Each OpenAI request runs in its own thread, but without the overhead of real threads.

Spring AI Prompt Call

You create a Prompt, then send it to the model:

ChatResponse response = chatModel.call(
  new Prompt(List.of(
    new SystemMessage(“You are a helpful assistant…”),
    new UserMessage(userInput)
  ))
);

You get back a structured response. From there, you just extract the output:

String summary = response.getResult().getOutput().getText();

Processing in Batches

Sending all prompts at once isn’t a good idea (rate limits, reliability, memory). Instead, chunk them into smaller batches (e.g., 300 items):

int batchSize = 300;
int totalBatches = (inputs.size() + batchSize — 1) / batchSize;

For each batch:

  • Launch a CompletableFuture for every input
  • Wait for all with CompletableFuture.allOf(…).join()
  • Collect the results

Handling Errors Gracefully

Each task is wrapped in a try/catch block. So if one OpenAI call fails, it doesn’t crash the batch. You just skip that result.

.map(input -> CompletableFuture.supplyAsync(() -> {
  try {
    ChatResponse r = chatModel.call(…);
    return r.getResult().getOutput().getText();
  } catch (Exception e) {
    return null;
  }
}))

Process Results in Bulk

After processing each batch:

  • Filter out the failed ones
  • Process the valid results
List processed = futures.stream()
.map(CompletableFuture::join)
.filter(Objects::nonNull)
.toList();

Full Implementation

In this example, we get a list of text, and send them to OpenAI in batches to get a summary. We do that in parallel, which makes the process much faster. After getting the summaries, we saves the results. Everything runs in a way that handles errors and avoids overloading the system.

@Service
public class BulkSummarizationService {

    private static final Logger logger = LoggerFactory.getLogger(BulkSummarizationService.class);
    private final ChatClient chatClient;
    private final TextRepository textRepository;

    public BulkSummarizationService(ChatClient chatClient, TextRepository textRepository) {
        this.chatClient = chatClient;
        this.textRepository = textRepository;
    }

    public void summarizeTexts(boolean overwrite) {
        logger.info("Starting bulk summarization");
        List textsToSummarize = textRepository.findAll();
        logger.info("Found {} texts to summarize", textsToSummarize.size());

        if (textsToSummarize.isEmpty()) return;

        int batchSize = 300;
        int totalBatches = (textsToSummarize.size() + batchSize - 1) / batchSize;

        try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
            for (int i = 0; i < totalBatches; i++) {
                int start = i * batchSize;
                int end = Math.min(start + batchSize, textsToSummarize.size());
                List batch = textsToSummarize.subList(start, end);

                logger.info("Processing batch {} of {} ({} items)", i + 1, totalBatches, batch.size());

                List<CompletableFuture> futures = batch.stream()
                        .map(text -> CompletableFuture.supplyAsync(() -> {
                            try {
                                ChatResponse response = chatClient.call(
                                        new Prompt(List.of(
                                                new SystemMessage("""
                                                    You are a helpful assistant that summarizes long pieces of text.
                                                    Focus on keeping the summary dense and informative.
                                                    Limit to 512 words.
                                                """),
                                                new UserMessage(text.getContent())
                                        ))
                                );
                                text.setSummary(response.getResult().getOutput().getText());
                                return text;
                            } catch (Exception e) {
                                logger.error("Failed to summarize text with ID: {}", text.getId(), e);
                                return null;
                            }
                        }, executor))
                        .toList();

                CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();

                List summarized = futures.stream()
                        .map(CompletableFuture::join)
                        .filter(Objects::nonNull)
                        .toList();

                if (!summarized.isEmpty()) {
                    textRepository.saveAll(summarized);
                    logger.info("Saved {} summaries", summarized.size());
                }
            }
        }

        logger.info("Bulk summarization complete");
    }
}

And that’s it! You now have a fully async, high-throughput pipeline that can send hundreds of prompts to OpenAI — safely and efficiently — using nothing but Spring AI, Java Virtual Threads, and good batching.

Stay curious!

Do you want your ad here?

Contact us to get your ad seen by thousands of users every day!

[email protected]

Comments (0)

Highlight your code snippets using [code lang="language name"] shortcode. Just insert your code between opening and closing tag: [code lang="java"] code [/code]. Or specify another language.

No comments yet. Be the first.

Subscribe to foojay updates:

https://foojay.io/feed/
Copied to the clipboard