Do you want your ad here?

Contact us to get your ad seen by thousands of users every day!

[email protected]

JVector 1.0

  • October 02, 2023
  • 10888 Unique Views
  • 2 min read

JVector is a pure Java embedded vector search engine that powers DataStax Astra and is being added to Apache Cassandra.

Vector search is a critical part of today’s generative AI applications, allowing developers to quickly retrieve the most relevant context to give the large language model enough information to answer accurately and without hallucinating, but innovation in this space has mostly happened outside the Java ecosystem.  JVector gives enterprises an easy way to capitalize on their investment in the powerful Java platform, and gives Java developers a state-of-the-art solution that is easy to embed in their applications.

JVector’s closest relative is Apache Lucene’s vector search.  Lucene implements the HNSW vector search algorithm, which is known to be fast but memory-hungry.  Because it is based on the more sophisticated DiskANN algorithm, JVector is over 10x faster than Lucene for large datasets, holding other things equal.  For example, here is a comparison of searching the Deep100M dataset (about 35GB of vectors and 20GB of index data) with Lucene and with JVector:

JVector is fast, memory-efficient, disk-aware, concurrent, easy to embed, and incremental.

Incremental means that you can start searching your JVector index immediately.  There are no batches or microbatches or “commit” stages to wait for.

Concurrent means that you can build and search a JVector index with multiple threads simultaneously.  Here you can see that doubling the number of threads adding vectors cuts build time in half, out to 32 threads.  (X and Y axes are both logarithmic.)

JVector is designed to be straightforward to embed while preserving high performance.  Here is the code to compute the index for the SIFT dataset shown above.  In under 100 lines it

  • Computes product quantization for the vectors (a kind of compression)
  • Loads the vectors into the index, in parallel
  • Saves the index to disk
  • Conducts searches in parallel, against both in-memory and on-disk indexes
  • Computes recall vs ground truth and reports performance numbers

JVector runs on JDK11+, and takes advantage of Panama SIMD acceleration on JDK 20+.  JVector is available under the Apache License 2.0.

Try it out today and let us know what you think!

A Case for Databases on Kubernetes from a Former Skeptic

Looking back at the pitfalls of running databases on Kubernetes I encountered several years ago, most of them have been resolved.

All of these problems are hard and require technical finesse and careful thinking. Without choosing the right pieces, we’ll end up resigning both databases and Kubernetes to niche roles in our infrastructure, as well as the innovative engineers who have invested so much effort in building out all of these pieces and runbooks.

Adelphi: Apache Cassandra™ Testing Goes Cloud Native

Adelphi is an open-source QA tool for Apache Cassandra, packaged as a Helm chart, simplifying Kubernetes data integrity and performance tests.

Announcing the Astra Service Broker: Tradeoff-Free Cassandra in Kubernetes

Today, we are releasing the DataStax Astra Service Broker, so you can seamlessly integrate Cassandra into your Kubernetes deployments and leave the operations to somebody else.

In this article, we’ll show you exactly how easy it is to use Astra with Kubernetes, and make you wonder why anyone would do anything else.

Apache Cassandra 4.0: Taming Tail Latencies with Java 16 ZGC

With Apache Cassandra 4.0, you not only get the direct improvements to performance added by the Apache Cassandra committers, you also unlock the ability to take advantage of seven years of improvements in the JVM itself.

This article focuses on improvements in Java garbage collection that Cassandra 4.0 coupled with Java 16 offers over Cassandra 3.11 on Java 8.

Available Now – gRPC for Apache Cassandra

General availability of a gRPC API for Apache Cassandra to leverage a powerful database in combination with a microservices-oriented API.

Do you want your ad here?

Contact us to get your ad seen by thousands of users every day!

[email protected]

Comments (0)

Highlight your code snippets using [code lang="language name"] shortcode. Just insert your code between opening and closing tag: [code lang="java"] code [/code]. Or specify another language.

No comments yet. Be the first.

Subscribe to foojay updates:

https://foojay.io/feed/
Copied to the clipboard