Machine Learning Based SPAM Detection Using ONNX in Java

February 10, 2026
181 Unique Views
5 min read

Table of Contents

Which model to use?The ControllerThe Spam Detection ServiceRunning the service via DockerConclusion

Believe it or not, it is possible to do Machine Learning in Java. In this article I go over how to implement a Spring Boot API for Spam Detection using an advanced anti-spam model from the Hugging Face onnx-community and Microsoft’s ONNX Runtime for Java.

We will package the API up as a Docker image which we can run a container from using docker or podman, and I guess in theory you could deploy on your Kubernetes cluster, if you (are) fancy.

The code for this project is on a GitHub repo: https://github.com/zikani03/spam-detection-with-onnx

Which model to use?

SPAM detection is a very important part of modern digital communications especially if your running platforms that accept User Generated Content (UGC). Implementing SPAM detection is one of the classic machine learning problems, and there are many approaches to doing so.

Fortunately, it is possible to find an open SPAM detection model now on Hugging Face and use it without much ado, even for commercial use. As I was looking around on Huggingface I came across OTIS, from the description of the project it says

Otis is an advanced anti-spam artificial intelligence model designed to mitigate and combat the proliferation of unwanted and malicious content within digital communication channels.

Sounds interesting enough, so I looked to see if there was an ONNX version of this model and was glad to find that the onnx-community organization has exactly that, here.

So the next step was to download the model.onnx and tokenizer.json files and include them in the project. Otis is licensed under BSD 3-Clause license for the curious.

The Controller

The controller isn’t much but here it is for reference, as you can see we have defined our API endpoint at the path: /api/spam/check which is intended to be called via a POST request. We rely on Spring’s internal content negotiation for the request and responses meaning we can expect to be able to send and receive JSON.

@RequestMapping("/api/spam/check")
@RestController
public class SpamCheckerController {
    private final SpamDetectionService spamDetectionService;

    public SpamCheckerController(@Autowired  SpamDetectionService spamDetectionService) {
        this.spamDetectionService = spamDetectionService;
    }

    @PostMapping
    public ResponseEntity<SpamCheckResponse> checkSpam(@RequestBody SpamCheckRequest request) throws Exception {
        return ok(spamDetectionService.detectSpam(request));
    }
}

The Spam Detection Service

The end goal is to have an API that can be called from HTTP client. But In order to separate concerns, we place the inference code for the Spam detection in a class named SpamDetectionService with an appropriate @Service annotation.

Inside this class we leverage the ONNX runtime for Java, passing the paths to the model and tokenizer files to initiate a HuggingFaceTokenizer . Here is the full code of the service:

@Service
public class SpamDetectionService implements AutoCloseable {

    private final HuggingFaceTokenizer tokenizer;
    private final OrtEnvironment env;
    private final OrtSession session;

    public SpamDetectionService(
            @Value("${model.path:-/models/model.onnx}") String modelPath,
            @Value("${tokenizer.path:-/models/tokenizer.json}") String tokenizerPath) throws IOException, OrtException {

        this.env = OrtEnvironment.getEnvironment();
        // Load session options -- no particular settings for GPU or CUDA environments
        OrtSession.SessionOptions options = new OrtSession.SessionOptions();
        options.setInterOpNumThreads(2);

        this.session = env.createSession(modelPath, options);
        this.tokenizer = HuggingFaceTokenizer.builder()
                .optPadding(true) // Add 0s if text is too short
                .optTruncation(true) // Cut off if text is too long
                .optTokenizerPath(Paths.get(tokenizerPath))
                .build();
    }

    public SpamCheckResponse detectSpam(SpamCheckRequest request) throws OrtException {
        long startTime = System.currentTimeMillis();
        var response = this.detectSpam(request.content());
        long endTime = System.currentTimeMillis();
        return new SpamCheckResponse(
                response.label,
                response.confidence,
                request.requestId(),
                endTime - startTime
        );
    }

    private RawResult detectSpam(String text) throws OrtException {
        Encoding encoding = tokenizer.encode(text);
        long[] inputIds = encoding.getIds();
        long[] attentionMask = encoding.getAttentionMask();
        long[] shape = {1, inputIds.length};

        try (OnnxTensor inputTensor = OnnxTensor.createTensor(env, LongBuffer.wrap(inputIds), shape);
             OnnxTensor maskTensor = OnnxTensor.createTensor(env, LongBuffer.wrap(attentionMask), shape)) {

            Map<String, OnnxTensor> inputs = new HashMap<>();
            inputs.put("input_ids", inputTensor);
            inputs.put("attention_mask", maskTensor);
            String tokenTypeIdsName = "token_type_ids";
            String outputName = session.getOutputNames().iterator().next();

            if (session.getInputNames().contains(tokenTypeIdsName)) {
                long[] tokenTypeIds = new long[inputIds.length];
                inputs.put(tokenTypeIdsName, OnnxTensor.createTensor(env, LongBuffer.wrap(tokenTypeIds), shape));
            }

            try (OrtSession.Result results = session.run(inputs)) {
                return formatResults(results, outputName);
            } finally {
                inputs.values().forEach(OnnxTensor::close);
            }
        }
    }

    record RawResult(String label, float[] probs, float cleanProb, float scamProb, float confidence) {}

    private RawResult formatResults(OrtSession.Result results, String outputName) throws OrtException {
            float[][] logitsArray = (float[][]) results.get(outputName).get().getValue();
            float[] rawLogits = logitsArray[0];
            float[] probs = softmax(rawLogits);

            float cleanProb = probs[0] * 100;
            float scamProb = probs[1] * 100;

            int prediction = (probs[1] > probs[0]) ? 1 : 0;

            String label = (prediction == 1) ? "SCAM" : "CLEAN";

            float confidence = (prediction == 1) ? scamProb : cleanProb;

            //return ("Result: " + label + " (" + String.format("%.2f", confidence) + "% confidence)");
            return new RawResult(label, probs, cleanProb, scamProb, confidence);
    }

    public static float[] softmax(float[] logits) {
        float[] probabilities = new float[logits.length];
        float maxLogit = Float.NEGATIVE_INFINITY;
        for (float v : logits) {
            if (v > maxLogit) maxLogit = v;
        }
        float sum = 0.0f;
        for (int i = 0; i < logits.length; i++) {
            probabilities[i] = (float) Math.exp(logits[i] - maxLogit);
            sum += probabilities[i];
        }
        for (int i = 0; i < logits.length; i++) {
            probabilities[i] /= sum;
        }

        return probabilities;
    }

    @Override
    public void close() throws Exception {
        session.close();
        env.close();
        tokenizer.close();
    }
}

You may note that the paths have default values which point to a directory starting with /models that’s because we intend to run this by default from a Docker container.

However, you can customize the paths to these models using the following configuration in a Spring Boot configuration file, e.g. in application.yaml:

# application.yaml
model:
  path: "/path/to/models/model.onnx"
tokenizer:
  path: "/path/to/models/tokenizer.json"

Running the service via Docker

The project in the repository uses Jib to build docker image from the Java source code. Run the following command to build the container, by default the created image will be named zikani03/spam-detection-with-onnx

$ ./mvnw clean jib:dockerBuild

Once the build completes successfully you can run a docker container using the following, binding on port 8080 which the API runs at inside the container.

$ docker run -p "8080:8080"  zikani03/spam-detection-with-onnx

Once that’s running, you can then test the SPAM Detection service using your favourite HTTP Client e.g. Postman, Insomnia or even just cURL:

$ curl -X POST -H "Content-Type: application/json" -d '{"requestId":"test","content":"Cһeck out our amazinɡ bооѕting serviсe ѡhere you can get to Leveӏ 3 for 3 montһs for just 20 USD.","token":"abc"}' "http://localhost:8080/api/spam/check"

You should get a result similar to this:

{"result":"SCAM","confidence":99.99815368652344,"id":"test","checkDurationMillis":149}

I like to load test things with hey, not bad.

The performance is okay, considering this is all running on CPU and not GPU (which I’m sure you can use with the onnxruntime libraries).

Conclusion

I have been curious about performing Machine Learning with Java for a while and ran into ONNX as I was trying out some Python stuff and got curious if I could leverage ONNX models in Java, and ofcourse you can! Microsoft’s onnxruntime for Java is a great place to start.

Sure, there is a lot more to add to this project to make it a real production-grade service, but I hope I have illustrated how it is possible to do some inference with Java and ONNX models. There are many models out there which you can leverage for different use cases.

I hope you are as excited about doing ML in Java too.

Don’t Forget to Share This Post!

Zikani Nyirenda Mwase

Author

Software Engineer

JC-AI Newsletter #13

Understanding MCP Through Raw STDIO Communication

Machine Learning Based SPAM Detection Using ONNX in Java

Spring Boot 4 OpenTelemetry Guide: Metrics, Traces, and Logs Explained

The Ultimate 10 Years Java Garbage Collection Guide (2016–2026) – Choosing the Right GC for Every Workload

Service Layer Pattern in Java With Spring Boot

Abstracting Data Access in Java With the DAO Pattern

Preparing for Spring Framework 7 and Spring Boot 4

Dissection of Joeffice: Open Source Office Suite in Java

MongoDB 8.0 Migration Guide: What You Need to Know Before Upgrading

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Project Panama for Newbies (Part 1)

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Beginning JavaFX Applications with IntelliJ IDE

Foojay Slack: bit.ly/join-foojay-slack

SpringBoot 3.2 + CRaC

Creating Scalable OpenAI GPT Applications in Java

Stable, Secure, and Affordable Java

Azul Platform Core is the #1 Oracle Java alternative, offering OpenJDK support for more versions (including Java 6 & 7) and more configurations for the greatest business value and lowest TCO.

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Learn about a number of experiments that have been conducted with Apache Kafka performance on Azul Platform Prime, compared to vanilla OpenJDK. Roughly 40% improvements in performance, both throughput and latency, are achieved.

JC-AI Newsletter #3

The first and second newsletters introduced a 14-day cadence, and even though it is the holiday season for many of us, we are sticking to the promised period. The current newsletter vol.3, brings a collection of valuable articles focusing on …

Aug 19 3,4K

Miro Wengner

Research Opinion Machine Learning JC-AI Newsletter Java GenAI Cloud

Not a Lucid Web3 Dream Anymore: x402, ERC-8004, A2A, and The Next Wave of AI Commerce

Table of Contents Vocabulary for this articleForewordPart 1 – Bringing companies on-chain with x402Part 2- Introduction: Beyond Ads and Subscriptions: Agent Commerce on x402 and ERC-8004Part 3 – Tech that will change the internetAgent commerce, x402, and ERC-8004: from ad-funded …

Jan 09 3,4K

Michal Maléř

Tools Opinion Microservices LLM Library GenAI Embedded DevOps Developer Tools Cloud

Rate limiting with Redis: An essential guide

Table of Contents Why Redis for Rate Limiting?Popular Rate-Limiting PatternsLeaky BucketToken BucketFixed Window CounterSliding Window LogSliding Window CounterChoosing the Right Tool for the JobUnderstand Your Traffic PatternsAssess the Level of Precision NeededConsider Resource ConstraintsAccount User ExperienceStay curious! This article is …

Jan 13 18,2K

Raphael De Lio

Databases

nosql

[Unit] Testing Supabase in Kotlin using Test Containers

In this article, I’ll dive into several methods I’ve been looking into to unit test a Kotlin application using Supabase and why I finally decided to go for a Docker Compose / Test Containers solution.

Nov 15 5,4K

Julien Lengrand-Lambert

Testing

Kotlin

12 Lessons Learned From Doing The One Billion Row Challenge

How fast can you process a 1 billion rows text file in Java? That’s the challenge that many Java developers tried to solve in January 2024.

Feb 22 18,7K

Anthony Goubard

Performance

Java Core JDK21 Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Stable, Secure, and Affordable Java

Machine Learning Based SPAM Detection Using ONNX in Java

Which model to use?

The Controller

The Spam Detection Service

Running the service via Docker

Conclusion

Zikani Nyirenda Mwase

Zikani Nyirenda Mwase

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Stable, Secure, and Affordable Java

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Comments (0)

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Stable, Secure, and Affordable Java

Do you want your ad here?

Machine Learning Based SPAM Detection Using ONNX in Java

Which model to use?

The Controller

The Spam Detection Service

Running the service via Docker

Conclusion

Zikani Nyirenda Mwase

Zikani Nyirenda Mwase

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Stable, Secure, and Affordable Java

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Do you want your ad here?

Related Articles

Comments (0)

Set Event Reminder

Subscribe to foojay updates:

Share with