JC-AI Newsletter #7

October 14, 2025
5786 Unique Views
5 min read

Fourteen days have passed, and it is time to present a fresh collection of readings that could influence developments in the field of artificial intelligence.

Beyond focused tutorials that can enhance your understanding of AI applications, this newsletter concentrates on Hallucination, Java Code Generation, Testing, Agentic System Architecture and LLM benchmarking methodologies designed to ensure models accuracy and competency in handling complex contextual information.

The world influenced by LLM is changing very quickly, let's start...

article: The Missing Layer in AI Infrastructure: Aggregating Agentic Traffic
authors: Eyal Solomon
date: 2025-08-22
desc.: Agentic AI systems introduce new challenges across the entire system architecture. Beyond what research articles address, several critical issues remain unresolved and may pose serious risks, particularly in system architecture where the adoption of LLMs has triggered a major paradigm shift in system design. A key concern involves outbound API calls made by autonomously acting AI agents (e.g., chaining tools, calling external services). Current infrastructure, including API gateways and service meshes, is primarily designed around inbound traffic or service-to-service communication, rather than managing agent-initiated outbound calls. This creates a significant blind spot in our architectural oversight.
category: architecture

article: Learning to Reason for Hallucination Span Detection
authors: Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Kundan Krishna, Hadi Pouransari, Cheng-Yu Hsieh and others
date: 2025-10-02
desc.: This paper addresses the challenges of advancing from simple binary classification (Chain-of-Thought, COT)) to fine-grained span-level hallucination detection. In-domain reasoning is essential for robust hallucination detection. The normalization step in Group Relative Policy Optimization proves crucial, as simple reward rescaling policies cannot effectively mitigate reward hacking in the dataset employed. The paper proposes a reinforcement learning framework with span-level rewards to align large language model (LLM) reasoning with hallucination detection tasks on the RAGTruth benchmark. Paper research has been done during an internship at Apple.
category: research

article: Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
authors: Siddhant Arora, Haidar Khan, Kai Sun, Xin Luna Dong, Sajal Choudhary, Seungwhan Moon, Xinyuan Zhang and others
date: 2025-10-02
desc.: End-to-end speech recognition and fluent answering without noticeable pauses present significant challenges for utilizing LLMs in dialogue-based agentic systems. These systems are prone to hallucination effects caused by various factors. While improving input/output accuracy through Retrieval Augmented Generation (RAG) approaches can mitigate hallucination effects and significantly increase accuracy, this comes with various penalties such as increased resource requirements and latencies. The paper proposes a Model-Triggered Stream RAG approach as an alternative to fixed-interval RAG streaming or without RAG. Although the paper does not provide a complete solution to these challenges, it proposes a benchmarking strategy for future research and highlights key achievements. This research was conducted in cooperation with Meta.
category: research

article: Self-Anchor: Large Language Model Reasoning via Step-by-step Attention Alignment
authors: Hongxiang Zhang, Yuan Tian, Tianyi Zhang
date: 2025-10-03
desc.: While prompting strategies show effectiveness in certain tasks, they lack robustness across different benchmarks and model architectures, performing better on larger LLMs and simpler reasoning problems. This paper proposes a Self-Anchor mechanism for structured reasoning with automatic anchoring. Self-Anchor delivers consistent improvements across tasks, model sizes, and architectures, demonstrating strong robustness and effectiveness. The approach leverages inherent structure in reasoning chains to improve attention alignment and enhance reasoning capabilities. However, Self-Anchor primarily addresses attention misalignment without fully resolving deeper issues related to logical validity, semantic understanding, or computational precision.
category: research

article: Abstain and Validate: A Dual-LLM Policy for Reducing Noise in Agentic Program Repair
authors: José Cambronero, Michele Tufano, Sherry Shi, Renyao Wei, Grant Uy, Runxiang Cheng and others
date: 2025-10-03
desc.: Agentic Automated Program Repair (APR) is increasingly addressing complex, repository-level bugs in industry settings. However, agent-generated patches still require human review before deployment to ensure they properly resolve the underlying issues. This paper introduces two complementary LLM-based policies for patch assessment. The paper addresses and discusses limitations in automated patch procedures, human supervision requirements, and company-specific bug-fixing approaches. This paper results from cooperation between Google and Meta.
category: research

article: Cache-to-Cache: Direct Semantic Communication Between Large Language Models
authors: Tianyu Fu, Zihan Min, Hanling Zhang, Jichao Yan, Guohao Dai, Wanli Ouyang, Yu Wang
date: 2025-10-03
desc.: Rather than relying on text-to-text communication between LLM-based systems, which incurs latency penalties, this paper proposes the Cache-To-Cache paradigm for direct inter-system communication. Experimental results demonstrate improved efficiency and performance without requiring additional cache capacity.
category: research

article: FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
authors: Victor May, Diganta Misra, Yanqi Luo, Anjali Sridhar, Justine Gehring, Silvio Soares Ribeiro Junior
date: 2025-10-06
desc.: This paper introduces the FreshBrew approach for Java code migration tasks utilizing agentic LLM systems. The migration experiments from JDK8 to JDK17 and JDK21 demonstrate the limitations of current LLM implementations, even when integrated with modern deterministic migration tools such as OpenRewrite. Although the overall migration success rate was approximately 50%, the paper provides a comprehensive discussion of the associated limitations and challenges. The article has been done in cooperation with Max-Planck Institute, Google and Saleforce companies.
category: research

article: Investigating The Smells of LLM Generated Code
authors: Debalina Ghosh Paul, Hong Zhu, Ian Bayley
date: 2025-10-03
desc.: The paper proposes a scenario-based method for evaluating the quality of LLM-generated code, as such models are increasingly utilized for program code generation. The study experiments with Java programs using Gemini Pro, ChatGPT, Codex, and Falcon LLMs to obtain results. The paper highlights that for moderately advanced topics, particularly those involving object-oriented programming concepts, the generated code quality is noticeably poorer to human-written code.
category: research

article: Which Programming Language and Model Work Best With LLM-as-a-Judge For Code Retrieval?
authors: Lucas Roberts, Denisa Roberts
date: 2025-09-30
desc.: This paper examines the comparative abilities of Large Language Models (LLMs) and human annotators in identifying and annotating specific elements within source code. The study investigates several widely-used programming languages, including C, Java, JavaScript, Go, and Python. The experimental results reveal various limitations and challenges associated with automated code annotation, while proposing possibilities for future research and emphasizing the critical role of human expertise in the annotation process.
category:

article: AI Coding Tools Blog Post - Model Context Protocol Mastery - Claude, Cursor
authors: Mani Sarkar
date: 2025-10-07
desc.: The post describes and provides guidance on configuring MCP for AI-assisted development using Claude and Cursor. Please be aware of the 'No Warranty' statement and use this as an example only.
category: tutorial

article: DiffTester: Accelerating Unit Test Generation for Diffusion LLMs via Repetitive Pattern
authors: Lekang Yang, Yuetong Liu, Yitong Zhang, Jia Li
date: 2025-09-29
desc.: Writing high-quality unit tests is often a time-consuming effort that requires extensive knowledge of the business domain. This paper proposes DiffTester, an acceleration framework designed to overcome the limitations imposed by single-token generation constraints. The DiffTester framework identifies common patterns through syntax tree analysis. Experimental results demonstrate that the DiffTester framework can used to generate a larger number of tokens, thereby achieving better accuracy.
category: research

article: Deloitte caught out using AI in $440,000 report | 7.30
authors: ABC News In-depth
date: 2025-10-09
desc.: Hallucination remains a significant challenge in current large language models (LLMs). These inaccuracies can cause damage at various levels and require careful eye to identify.
category: youtube

article: SusBench: An Online Benchmark for Evaluating Dark Pattern Susceptibility of Computer-Use Agents
authors: Longjie Guo, Chenjie Yuan, Mingyuan Zhong, Robert Wolfe, Ruican Zhong, Yue Xu, Bingbing Wen and others
date: 2025-10-13
desc.: In the age of AI, deception through manipulation or hallucination-based choices poses a serious threat to human or system reliability. This paper proposes SysBench, a benchmark for evaluating the susceptibility of computer-user agents (CUAs) and humans to dark patterns that may mislead both users and agents into making harmful decisions. The paper demonstrates that neither humans nor agentic AI systems based on large language models (LLMs) exhibit adequate resistance against dark patterns.
category: research

Previous:
Newsletter vol.1
Newsletter vol.2
Newsletter vol.3
Newsletter vol.4
Newsletter vol.5
Newsletter vol.6

Don’t Forget to Share This Post!

Miro Wengner

Author

Miro has been a member of the Java Community Process (JCP) for a long time. He contributes to the OpenJDK and Mission Control project. His focus is on Java performance and maintainability. Miro has also contributed contributing to various other open source projects such as OpenTracing, Pi4J, and more. He is also co-author of the Robo4j project which has been awarded the Duke's Choice Award 2017. Miro has been recognized as a Java Champion, Oracle ACEPro and JavaOne Rock Star speaker. Aside from his daily duties as a Principal Engineer at OpenValue, he shares his knowledge at conferences (JavaOne, CodeOne, Devoxx, GeeCON etc.) and blogs.

Preparing for Spring Framework 7 and Spring Boot 4

Service Layer Pattern in Java With Spring Boot

JC-AI Newsletter #9

Micrometer & Prometheus in Spring Boot: Kafka Burger Orders🍔📨

Navigating the Nuances of GraphRAG vs. RAG

Understanding MCP Through Raw STDIO Communication

Foojay Podcast #82: OpenJDK Projects (Leyden, Babylon, Panama) and TornadoVM

Goodbye Payara Community 6, on to the next chapter with Payara Community 7

Explore Spring Framework 7 Features—API Versioning

Spring Framework 7.0 and Spring Data 2025.1.0 Embrace Jakarta EE 11 Compatibility

JC-AI Newsletter: Easy Access to Expanding Challenges

JC-AI Newsletter #9

JC-AI Newsletter #8

Preparing for Spring Framework 7 and Spring Boot 4

Understanding MCP Through Raw STDIO Communication

Explore Spring Framework 7 Features—API Versioning

Your New AI-Powered Coding Buddy: A Guide to SonarQube MCP Server on IntelliJ 🤖

Elastic JVM: Configuring G1 GC for Automatic Vertical Memory Scaling

Foojay Podcast #81: Maven 4 – The Future of Java Build Automation

Rate limiting with Redis: An essential guide

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Java 17 on the Raspberry Pi

Project Panama for Newbies (Part 1)

How to Create Mobile Apps with JavaFX (Part 1)

Beginning JavaFX Applications with IntelliJ IDE

Foojay Slack: bit.ly/join-foojay-slack

SpringBoot 3.2 + CRaC

Creating Scalable OpenAI GPT Applications in Java

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Learn about a number of experiments that have been conducted with Apache Kafka performance on Azul Platform Prime, compared to vanilla OpenJDK. Roughly 40% improvements in performance, both throughput and latency, are achieved.

Stable, Secure, and Affordable Java

Azul Platform Core is the #1 Oracle Java alternative, offering OpenJDK support for more versions (including Java 6 & 7) and more configurations for the greatest business value and lowest TCO.

JC-AI Newsletter #1

Table of Contents article: GitHub CEO: manual coding remains key despite AI boomarticle: Expert Generalistsarticle: Emerging Patterns in Building GenAI Productsarticle: Complex, AI-generated software projects will never happenarticle: Silicon Valley Insider EXPOSES Cult-Like AI Companies | Aaron Bastani Meets Karen …

Jul 22 12,2K

Miro Wengner

Java

Videos Research Machine Learning LLM JC-AI Newsletter GenAI Design Patterns AI

JC-AI Newsletter #2

Table of Contents In the first newsletter, we introduced a 14-day cadence, which means that this week it’s time for a new collection of articles from the fields of AI, LLM, Java and more.article: OpenAI CEO Sam Altman warns of …

Aug 05 7,2K

Miro Wengner

Videos Research Machine Learning LLM JC-AI Newsletter Java GenAI Design Patterns

JC-AI Newsletter #3

The first and second newsletters introduced a 14-day cadence, and even though it is the holiday season for many of us, we are sticking to the promised period. The current newsletter vol.3, brings a collection of valuable articles focusing on …

Aug 19 2,9K

Miro Wengner

Research Opinion Machine Learning JC-AI Newsletter Java GenAI Cloud

JC-AI Newsletter #4

Table of Contents Previous: 14 days have passed and it’s time for a new batch of readings that could shape developments in the field of artificial intelligence. The current newsletter vol. 4 offers us a closer look at several different …

Sep 02 5,7K

Miro Wengner

Research Machine Learning LLM LangChain4j JC-AI Newsletter Java GenAI

JC-AI Newsletter #5

Table of Contents Fourteen days have passed, and it is time to present a fresh collection of readings that could influence developments in the field of artificial intelligence. Fourteen days have passed, and it is time to present a fresh …

Sep 18 9,4K

Miro Wengner

Agile

Research Opinion Machine Learning JC-AI Newsletter Java GenAI AI

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

JC-AI Newsletter #7

Miro Wengner

Miro Wengner

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Comments (0)

Stable, Secure, and Affordable Java

Jakarta EE 11: Beyond the Era of Java EE

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Do you want your ad here?

JC-AI Newsletter #7

Miro Wengner

Miro Wengner

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Do you want your ad here?

Related Articles

Comments (0)

Set Event Reminder

Subscribe to foojay updates:

Share with