vecdb

package
v0.9.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 11, 2026 License: MIT Imports: 12 Imported by: 0

README

vecdb

In-memory vector indices for Go with Flat (exact) and HNSW (approximate) search.

Features

  • Generics over ID types.
  • Support for L2 squared and cosine distance.
  • Flat and HNSW indices with shared API.
  • Tests, fuzzing, and benchmarks included.
  • Vectors use float32 and leverage SIMD via github.com/viterin/vek/vek32 on supported CPUs.
  • No cgo required; SIMD uses Go asm with a pure-Go fallback.

Flat vs HNSW

  • Flat is exact search over all vectors (O(n) per query). Use it for small datasets, tight correctness requirements, or when build time must be minimal.
  • HNSW is approximate search with faster queries on larger datasets, at the cost of more memory and slower inserts/builds. Use it when you can trade some recall for speed.

Usage

package main

import "github.com/delaneyj/toolbelt/vecdb"

func main() {
	idx := vecdb.NewHNSW[string](2,
		vecdb.WithMetric(vecdb.MetricCosine),
		vecdb.WithEFConstruction(200),
		vecdb.WithEFSearch(50),
	)

	_ = idx.Add("a", 1, 0)
	_ = idx.Add("b", 0, 1)

	results := idx.Search(2, 1, 0)
	_ = results

	weighted := idx.SearchWeighted(2,
		vecdb.WeightedQuery{Weight: 1, Vector: []float32{1, 0}},
		vecdb.WeightedQuery{Weight: -0.25, Vector: []float32{0, 1}},
	)
	_ = weighted
}

API highlights

  • Add(id, vector...) inserts a new vector (ErrIDExists if the id already exists).
  • Upsert(id, vector...) inserts or updates.
  • BatchUpsert(ids, vectors) inserts or updates multiple vectors.
  • Delete(id) removes by id.
  • Clear(keepCapacity) removes all vectors, optionally keeping backing storage.
  • Vector(id) returns a copy of the vector.
  • ColumnName(dim), ColumnNames(), SetColumnName(dim, name), and SetColumnNames(names...) get/set per-dimension column names (0-based).
  • Save(w, ...PersistOption) and Load(r, ...PersistOption) persist and restore indices (HNSW includes graph structure).
  • Search(k, vector...) returns the k closest neighbors.
  • SearchWithOptions(k, vectorSlice, ...SearchOption) applies per-query options.
  • SearchWeighted(k, queries...) searches with weighted query vectors, normalized by the sum of absolute weights (negative weights allowed).

Generics NewHNSW[ID](dim, ...Option) uses a single type parameter:

  • ID: a comparable identifier used as the primary key for update/delete and lookup. Vector components are float32 only.

Example:

// IDs are strings, vectors are float32.
idx := vecdb.NewHNSW[string](384)

Options

  • WithMetric: choose MetricL2Squared or MetricCosine.
  • WithM, WithEFConstruction, WithEFSearch: HNSW tuning.
  • WithSeed or WithRNG: HNSW level generation control.
  • WithColumnNames: set per-dimension column names (0-based).
  • WithFilter: per-query filter on id.
  • WithEF: per-query override for HNSW ef.

Notes

  • This package is in-memory only with explicit Save/Load persistence.
  • Distances are returned as Score, lower is better.
  • HNSW deletes are tombstones; memory is not compacted.
  • Persistence uses a default ID codec for strings, bools, and numeric types; provide WithIDCodec for custom IDs.

Benchmarks Search: go test ./vecdb -bench=Search -benchmem -benchtime=1s -count=3 on AMD Ryzen 9 6900HX (linux/amd64). Build: go test ./vecdb -bench=Build -benchmem -benchtime=1s -count=3 on the same machine. Vector dim = 20, queries = 100, vectors = 20,000 for both indices. Each benchmark op runs 80 searches for Flat and HNSW, which keeps the slowest op around 1s on the reference machine without SIMD. Index size is an approximate heap delta after building the index in a fresh process (not per-op). At dim 20, SIMD results are noisy with only 3 samples; recent runs show Flat roughly unchanged and HNSW faster with vek32. Build time is measured by constructing a new index with 20,000 vectors (same dim/params as above).

Benchmark Vectors time/op B/op allocs/op index heap (overall)
FlatSearch 20,000 870 ms 25.01 MiB 400 2.78 MiB
HNSWSearch 20,000 59.9 ms 752 KiB 38.6k 7.57 MiB
Build Benchmark Vectors time/op B/op allocs/op
FlatBuild 20,000 13.0 ms 5.46 MiB 20.2k
HNSWBuild 20,000 41.3 s 1.38 GiB 30.4M

Tasks

  • task test run package tests
  • task fuzz run fuzzers (override with FUZZTIME=2m)
  • task bench run benchmarks

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrIDExists            = errors.New("vecdb: id already exists")
	ErrBatchSizeMismatch   = errors.New("vecdb: batch length mismatch")
	ErrDimMismatch         = errors.New("vecdb: dimension mismatch")
	ErrEmptyVector         = errors.New("vecdb: empty vector")
	ErrInvalidFormat       = errors.New("vecdb: invalid persistence format")
	ErrInvalidColumnIndex  = errors.New("vecdb: invalid column index")
	ErrColumnNamesMismatch = errors.New("vecdb: column names length mismatch")
	ErrUnsupportedIDType   = errors.New("vecdb: unsupported id type for persistence")
	ErrUnsupportedVersion  = errors.New("vecdb: unsupported persistence version")
)

Functions

This section is empty.

Types

type Flat

type Flat[ID comparable] struct {
	// contains filtered or unexported fields
}

Flat is a brute-force in-memory vector index.

func NewFlat

func NewFlat[ID comparable](dim int, opts ...Option) *Flat[ID]

NewFlat creates a flat index. If dim is zero, the first insert sets the dimension.

func (*Flat[ID]) Add

func (f *Flat[ID]) Add(id ID, vector ...float32) error

Add inserts a new vector. Returns ErrIDExists if id already exists.

func (*Flat[ID]) BatchUpsert added in v0.8.7

func (f *Flat[ID]) BatchUpsert(ids []ID, vectors [][]float32) error

BatchUpsert inserts or updates multiple vectors.

func (*Flat[ID]) Clear added in v0.8.3

func (f *Flat[ID]) Clear(keepCapacity bool)

Clear removes all vectors from the index. If keepCapacity is true, backing storage is retained for reuse.

func (*Flat[ID]) ColumnName added in v0.8.5

func (f *Flat[ID]) ColumnName(dim int) (string, bool)

ColumnName returns the associated column name for the given dimension (0-based).

func (*Flat[ID]) ColumnNames added in v0.8.6

func (f *Flat[ID]) ColumnNames() []string

ColumnNames returns a copy of the associated column names, indexed by dimension (0-based). Unset names are returned as empty strings.

func (*Flat[ID]) Delete

func (f *Flat[ID]) Delete(id ID) bool

Delete removes a vector by id.

func (*Flat[ID]) Dim

func (f *Flat[ID]) Dim() int

Dim returns the configured dimension. Zero means unset.

func (*Flat[ID]) Len

func (f *Flat[ID]) Len() int

Len returns the number of stored vectors.

func (*Flat[ID]) Load added in v0.8.4

func (f *Flat[ID]) Load(r io.Reader, opts ...PersistOption[ID]) error

Load replaces the flat index with data read from r.

func (*Flat[ID]) Metric

func (f *Flat[ID]) Metric() Metric

Metric returns the configured distance metric.

func (*Flat[ID]) Save added in v0.8.4

func (f *Flat[ID]) Save(w io.Writer, opts ...PersistOption[ID]) error

Save writes the flat index to w.

func (*Flat[ID]) Search

func (f *Flat[ID]) Search(k int, query ...float32) []Result[ID]

Search returns the k closest vectors to query.

func (*Flat[ID]) SearchWeighted

func (f *Flat[ID]) SearchWeighted(k int, queries ...WeightedQuery) []Result[ID]

SearchWeighted returns the k closest vectors to the weighted query sum, normalizing weights by the sum of absolute weights.

func (*Flat[ID]) SearchWeightedWithOptions

func (f *Flat[ID]) SearchWeightedWithOptions(k int, queries []WeightedQuery, opts ...SearchOption[ID]) []Result[ID]

SearchWeightedWithOptions returns the k closest vectors to the weighted query sum with options applied.

func (*Flat[ID]) SearchWithOptions

func (f *Flat[ID]) SearchWithOptions(k int, query []float32, opts ...SearchOption[ID]) []Result[ID]

SearchWithOptions returns the k closest vectors to query with options applied.

func (*Flat[ID]) SetColumnName added in v0.8.5

func (f *Flat[ID]) SetColumnName(dim int, name string) error

SetColumnName sets the associated column name for the given dimension (0-based).

func (*Flat[ID]) SetColumnNames added in v0.8.7

func (f *Flat[ID]) SetColumnNames(names ...string) error

SetColumnNames replaces all associated column names, indexed by dimension (0-based).

func (*Flat[ID]) Upsert

func (f *Flat[ID]) Upsert(id ID, vector ...float32) error

Upsert inserts or updates a vector.

func (*Flat[ID]) Vector

func (f *Flat[ID]) Vector(id ID) ([]float32, bool)

Vector returns a copy of the vector for an id, if present.

type HNSW

type HNSW[ID comparable] struct {
	// contains filtered or unexported fields
}

HNSW is an approximate in-memory vector index.

func NewHNSW

func NewHNSW[ID comparable](dim int, opts ...Option) *HNSW[ID]

NewHNSW creates a new HNSW index. If dim is zero, the first insert sets the dimension. Type parameters:

  • ID: a comparable identifier used as the primary key for update/delete/lookup.

func (*HNSW[ID]) Add

func (h *HNSW[ID]) Add(id ID, vector ...float32) error

Add inserts a new vector. Returns ErrIDExists if id already exists.

func (*HNSW[ID]) BatchUpsert added in v0.8.7

func (h *HNSW[ID]) BatchUpsert(ids []ID, vectors [][]float32) error

BatchUpsert inserts or updates multiple vectors. Updates are implemented as delete + add.

func (*HNSW[ID]) Clear added in v0.8.3

func (h *HNSW[ID]) Clear(keepCapacity bool)

Clear removes all vectors from the index. If keepCapacity is true, backing storage is retained for reuse.

func (*HNSW[ID]) ColumnName added in v0.8.5

func (h *HNSW[ID]) ColumnName(dim int) (string, bool)

ColumnName returns the associated column name for the given dimension (0-based).

func (*HNSW[ID]) ColumnNames added in v0.8.6

func (h *HNSW[ID]) ColumnNames() []string

ColumnNames returns a copy of the associated column names, indexed by dimension (0-based). Unset names are returned as empty strings.

func (*HNSW[ID]) Delete

func (h *HNSW[ID]) Delete(id ID) bool

Delete removes a vector by id.

func (*HNSW[ID]) Dim

func (h *HNSW[ID]) Dim() int

Dim returns the configured dimension. Zero means unset.

func (*HNSW[ID]) Len

func (h *HNSW[ID]) Len() int

Len returns the number of live vectors.

func (*HNSW[ID]) Load added in v0.8.4

func (h *HNSW[ID]) Load(r io.Reader, opts ...PersistOption[ID]) error

Load replaces the HNSW index with data read from r.

func (*HNSW[ID]) Metric

func (h *HNSW[ID]) Metric() Metric

Metric returns the configured distance metric.

func (*HNSW[ID]) Save added in v0.8.4

func (h *HNSW[ID]) Save(w io.Writer, opts ...PersistOption[ID]) error

Save writes the HNSW index to w, preserving graph structure.

func (*HNSW[ID]) Search

func (h *HNSW[ID]) Search(k int, query ...float32) []Result[ID]

Search returns the k closest vectors to query.

func (*HNSW[ID]) SearchWeighted

func (h *HNSW[ID]) SearchWeighted(k int, queries ...WeightedQuery) []Result[ID]

SearchWeighted returns the k closest vectors to the weighted query sum, normalizing weights by the sum of absolute weights.

func (*HNSW[ID]) SearchWeightedWithOptions

func (h *HNSW[ID]) SearchWeightedWithOptions(k int, queries []WeightedQuery, opts ...SearchOption[ID]) []Result[ID]

SearchWeightedWithOptions returns the k closest vectors to the weighted query sum with options applied.

func (*HNSW[ID]) SearchWithOptions

func (h *HNSW[ID]) SearchWithOptions(k int, query []float32, opts ...SearchOption[ID]) []Result[ID]

SearchWithOptions returns the k closest vectors to query with options applied.

func (*HNSW[ID]) SetColumnName added in v0.8.5

func (h *HNSW[ID]) SetColumnName(dim int, name string) error

SetColumnName sets the associated column name for the given dimension (0-based).

func (*HNSW[ID]) SetColumnNames added in v0.8.7

func (h *HNSW[ID]) SetColumnNames(names ...string) error

SetColumnNames replaces all associated column names, indexed by dimension (0-based).

func (*HNSW[ID]) Upsert

func (h *HNSW[ID]) Upsert(id ID, vector ...float32) error

Upsert inserts or updates a vector. Updates are implemented as delete + add.

func (*HNSW[ID]) Vector

func (h *HNSW[ID]) Vector(id ID) ([]float32, bool)

Vector returns a copy of the vector for an id, if present.

type IDCodec added in v0.8.4

type IDCodec[ID comparable] interface {
	Encode(w io.Writer, id ID) error
	Decode(r io.Reader) (ID, error)
}

IDCodec handles serialization of ID values.

type Metric

type Metric int

Metric defines how distances are computed.

const (
	MetricL2Squared Metric = iota
	MetricCosine
)

type Option

type Option func(*config)

Option configures an index at construction time.

func WithColumnNames added in v0.8.5

func WithColumnNames(names ...string) Option

WithColumnNames sets the associated vector column names by dimension (0-based).

func WithEFConstruction

func WithEFConstruction(ef int) Option

WithEFConstruction sets the efConstruction parameter for HNSW insertions.

func WithEFSearch

func WithEFSearch(ef int) Option

WithEFSearch sets the default efSearch parameter for HNSW queries.

func WithM

func WithM(m int) Option

WithM configures the maximum neighbors per layer for HNSW.

func WithMetric

func WithMetric(metric Metric) Option

WithMetric sets the distance metric.

func WithRNG

func WithRNG(rng *rand.Rand) Option

WithRNG sets the random source used for HNSW level generation.

func WithSeed

func WithSeed(seed int64) Option

WithSeed sets the random seed for HNSW level generation.

type PersistOption added in v0.8.4

type PersistOption[ID comparable] func(*persistOptions[ID])

PersistOption configures Save/Load behavior.

func WithIDCodec added in v0.8.4

func WithIDCodec[ID comparable](codec IDCodec[ID]) PersistOption[ID]

WithIDCodec overrides ID encoding/decoding for persistence.

type Result

type Result[ID comparable] struct {
	ID    ID
	Score float32
}

Result is a nearest-neighbor search result.

type SearchOption

type SearchOption[ID comparable] func(*searchOptions[ID])

SearchOption configures search behavior.

func WithEF

func WithEF[ID comparable](ef int) SearchOption[ID]

WithEF overrides efSearch for a single HNSW query.

func WithFilter

func WithFilter[ID comparable](filter func(id ID) bool) SearchOption[ID]

WithFilter filters candidates by ID.

type WeightedQuery

type WeightedQuery struct {
	Weight float32
	Vector []float32
}

WeightedQuery is a query vector scaled by Weight. SearchWeighted normalizes weights by the sum of absolute weights. Negative weights are allowed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL