codemium

module

v0.10.0 Latest Latest Go to latest Published: Mar 2, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/dsablic/codemium

Links

Open Source Insights

README ¶

Codemium

Generate code statistics across all repositories in a Bitbucket Cloud workspace, GitHub organization, GitHub user account, or GitLab group. Produces per-repo and aggregate metrics including lines of code, comments, blanks, and cyclomatic complexity for 200+ languages.

Features

Analyze all repos in a Bitbucket workspace, GitHub organization, GitHub user account, or GitLab group
Filter by Bitbucket projects, specific repos, or exclusion lists
Per-language breakdown: files, code lines, comments, blanks, complexity
Automatic vendor/generated/binary file filtering for accurate metrics (powered by go-enry)
Per-repo license detection with SPDX identifiers (e.g., MIT, Apache-2.0)
Code churn and hotspot analysis: find files that change most often and are most complex
JSON output to file (default: output/report.json) and optional markdown summary
Parallel processing with configurable concurrency
Progress bar in terminal, plain text fallback in CI/CD
AI code estimation: detect AI-assisted commits via co-author tags, message patterns, and bot authors
Pure Go, no external dependencies at runtime (no git or scc binary needed)

Installation

Homebrew

brew install dsablic/tap/codemium

Pre-built binaries

Download from the releases page.

From source

go install github.com/dsablic/codemium/cmd/codemium@latest

Authentication

Codemium supports interactive API token login and environment variable tokens.

Bitbucket

Option 1: API token (interactive)

Create a scoped API token at https://id.atlassian.com/manage-profile/security/api-tokens — click "Create API token with scopes", select Bitbucket as the app, and enable Repository Read
Run:
```
codemium auth login --provider bitbucket
```
This prompts for your Atlassian email and API token. Credentials are verified against the Bitbucket API and stored at ~/.config/codemium/credentials.json.

Option 2: Environment variable (CI/CD)

export CODEMIUM_BITBUCKET_USERNAME=your_email
export CODEMIUM_BITBUCKET_TOKEN=your_api_token

GitHub

Option 1: gh CLI (recommended)

If you already have the GitHub CLI installed and authenticated, codemium uses its token automatically — no extra setup needed:

# If not already authenticated:
gh auth login

# Then just run codemium directly:
codemium analyze --provider github --org myorg

You can also explicitly save the token to codemium's credential store:

codemium auth login --provider github

Option 2: OAuth device flow

If you have a GitHub OAuth App, you can use the device flow instead:

export CODEMIUM_GITHUB_CLIENT_ID=your_client_id
codemium auth login --provider github

This displays a code to enter at github.com/login/device.

Option 3: Environment variable (CI/CD)

export CODEMIUM_GITHUB_TOKEN=your_personal_access_token

Resolution order: CODEMIUM_GITHUB_TOKEN env var > saved credentials > gh auth token CLI.

GitLab

Option 1: Personal access token (interactive)

Create a personal access token at https://gitlab.com/-/user_settings/personal_access_tokens with the read_api scope
Run:
```
codemium auth login --provider gitlab
```
This prompts for your token, verifies it against the GitLab API, and stores it at ~/.config/codemium/credentials.json.

Option 2: glab CLI

If you have the GitLab CLI installed and authenticated, codemium can use its token automatically:

glab auth login
codemium analyze --provider gitlab --group mygroup

Option 3: Environment variable (CI/CD)

export CODEMIUM_GITLAB_TOKEN=your_personal_access_token

Resolution order: CODEMIUM_GITLAB_TOKEN env var > saved credentials > glab config get token CLI.

Usage

Analyze a Bitbucket workspace

# All repos in a workspace
codemium analyze --provider bitbucket --workspace myworkspace

# Filter by Bitbucket projects
codemium analyze --provider bitbucket --workspace myworkspace --projects PROJ1,PROJ2

# Specific repos only
codemium analyze --provider bitbucket --workspace myworkspace --repos repo1,repo2

# Exclude repos
codemium analyze --provider bitbucket --workspace myworkspace --exclude old-repo,deprecated-repo

Analyze a GitHub organization

# All repos in an org
codemium analyze --provider github --org myorg

# Specific repos
codemium analyze --provider github --org myorg --repos api,frontend

Analyze a GitHub user's repos

# All repos for a user (includes private repos the token has access to)
codemium analyze --provider github --user myuser

# Specific repos
codemium analyze --provider github --user myuser --repos repo1,repo2

Analyze a GitLab group

# All repos in a group (includes subgroups)
codemium analyze --provider gitlab --group mygroup

# Nested group
codemium analyze --provider gitlab --group myorg/mysubgroup

# Specific repos
codemium analyze --provider gitlab --group mygroup --repos api,frontend

Analyze trends over time

The trends command analyzes repositories at historical points in time using git history, showing how codebases evolve over configurable intervals.

# Monthly trends for the past year
codemium trends --provider github --org myorg --since 2025-03 --until 2026-02

# Weekly trends
codemium trends --provider github --org myorg --since 2025-01-01 --until 2025-03-01 --interval weekly

# Output to file, then convert to markdown
codemium trends --provider github --org myorg --since 2025-01 --until 2025-12 --output trends.json
codemium markdown trends.json > trends.md

Note: For Bitbucket, trends requires OAuth credentials (not API tokens), since it needs to clone full git history. Set CODEMIUM_BITBUCKET_CLIENT_ID and CODEMIUM_BITBUCKET_CLIENT_SECRET, then run codemium auth login --provider bitbucket.

Output options

# JSON to default file (output/report.json)
codemium analyze --provider github --org myorg

# JSON to custom file
codemium analyze --provider github --org myorg --output report.json

# Markdown summary
codemium analyze --provider github --org myorg --markdown report.md

# Both
codemium analyze --provider github --org myorg --output report.json --markdown report.md

AI narrative analysis

Generate a rich narrative analysis of your codebase using an AI CLI:

# Auto-detect AI CLI (tries claude, codex, gemini in order)
codemium markdown --narrative report.json

# Use a specific AI CLI
codemium markdown --narrative --ai-cli gemini report.json

# Add custom instructions
codemium markdown --narrative --ai-prompt "Focus on test coverage gaps" report.json

# Load instructions from file
codemium markdown --narrative --ai-prompt-file analysis-prompt.txt report.json

# Works with trends reports too
codemium markdown --narrative trends.json

Requires one of: Claude Code, Codex CLI, or Gemini CLI installed and authenticated.

Providing context for better narratives: The AI generates richer analysis when given domain context about your organization. Use --ai-prompt or --ai-prompt-file to describe project areas, team structure, or what specific repos contain:

# Inline context
codemium markdown --narrative --ai-prompt 'Project codes map to these areas:
- SVC = Backend Services
- WEB = Customer-Facing Web Apps
- MOB = Mobile Apps (iOS & Android)
- PLAT = Platform & Infrastructure
- SDK = Public SDKs and Client Libraries

The SVC repos include both microservices and shared libraries.
The PLAT team also maintains CI/CD pipelines.' report.json

# Or load from a file for longer descriptions
codemium markdown --narrative --ai-prompt-file org-context.txt report.json

This is especially useful when Bitbucket project codes or repo naming conventions aren't self-explanatory — the AI will use your descriptions to assign human-readable names and provide more insightful analysis.

Repository health classification

Classify repositories as Active, Maintained, Abandoned, or Failed based on commit history:

# Quick health check (1 API call per repo, no cloning)
codemium analyze --provider github --org myorg --health

# Deep health analysis with author counts, churn, and velocity per window
codemium analyze --provider github --org myorg --health-details

# Limit commits scanned for deep analysis (default: 500)
codemium analyze --provider github --org myorg --health-details --health-commit-limit 200

Health categories:

Active: last commit < 180 days ago
Maintained: 180–365 days ago
Abandoned: > 365 days ago
Failed: commit history could not be fetched (API error, permissions, etc.)

Persistent cloning

By default, repos are cloned to a temp directory and deleted after analysis. Use --clone to keep them:

# Clone repos to ./repos/<repo-slug>/ and keep them after analysis
codemium analyze --provider github --org myorg --clone ./repos

# Subsequent runs reuse existing clones (no re-download)
codemium analyze --provider github --org myorg --clone ./repos

Secret scanning

Scan repositories for leaked secrets (API keys, tokens, passwords) using gitleaks:

codemium analyze --provider github --org myorg --secrets

Results show per-repo finding counts and which files contain secrets (actual secret values are never included in the report).

Dependency inventory (SBOM)

Generate a software bill of materials for each repository:

codemium analyze --provider github --org myorg --sbom

The report includes per-repo dependency counts with ecosystem breakdown (e.g. go-module, npm, pip).

Rate limiting and error logging

API requests that receive a 429 (Too Many Requests) response are automatically retried with exponential backoff (up to 5 retries). Use --rate-limit to proactively throttle requests and avoid hitting rate limits (e.g., --rate-limit 5 for GitLab's 300 req/min raw endpoint limit).

When API errors occur during health classification, AI estimation, or detailed analysis, an error log is automatically written next to the JSON report (e.g., output/report.error.log for output/report.json). Each line is prefixed with a category ([health], [health-details], [ai-estimate], [ai-estimate-detail]) for easy filtering with grep.

Additional flags

--concurrency 10            # Parallel workers (default: 5)
--rate-limit 5              # Max API requests per second (default: unlimited)
--include-archived          # Include archived repos (excluded by default)
--include-forks             # Include forked repos (excluded by default)
--ai-estimate               # Estimate AI-generated code via commit history analysis
--ai-commit-limit 200       # Max commits to scan per repo (default: 200)
--health                    # Classify repos by activity level
--health-details            # Deep health analysis (implies --health)
--health-commit-limit 500   # Max commits for health details (default: 500)
--churn                     # Enable code churn and hotspot analysis
--churn-limit 500           # Max commits to scan per repo for churn (default: 500)
--clone ./repos                 # Persist cloned repos to directory (reuses existing clones)
--secrets                       # Scan repos for secrets (API keys, tokens, passwords)
--sbom                          # Generate dependency inventory (SBOM) per repository

Output Format

JSON

{
  "generated_at": "2026-02-18T12:00:00Z",
  "provider": "github",
  "organization": "myorg",
  "filters": {},
  "repositories": [
    {
      "repository": "my-repo",
      "provider": "github",
      "url": "https://github.com/myorg/my-repo",
      "languages": [
        {
          "name": "Go",
          "files": 42,
          "lines": 5000,
          "code": 3800,
          "comments": 400,
          "blanks": 800,
          "complexity": 120
        }
      ],
      "totals": {
        "files": 42,
        "lines": 5000,
        "code": 3800,
        "comments": 400,
        "blanks": 800,
        "complexity": 120
      }
    }
  ],
  "totals": {
    "repos": 1,
    "files": 42,
    "lines": 5000,
    "code": 3800,
    "comments": 400,
    "blanks": 800,
    "complexity": 120
  },
  "by_language": [
    {
      "name": "Go",
      "files": 42,
      "lines": 5000,
      "code": 3800,
      "comments": 400,
      "blanks": 800,
      "complexity": 120
    }
  ]
}

Markdown

The --markdown flag generates a GitHub-flavored markdown report with:

Summary table with aggregate metrics
Language breakdown sorted by code lines
Per-repository table with links
Error section for repos that failed to process

License

MIT License - see LICENSE for details.

Directories ¶

Path	Synopsis
cmd
codemium command
internal
aidetect
aiestimate
analyzer internal/analyzer/analyzer.go	internal/analyzer/analyzer.go
auth internal/auth/bitbucket.go	internal/auth/bitbucket.go
churn
health internal/health/details.go	internal/health/details.go
history
license
model internal/model/model.go	internal/model/model.go
narrative
output internal/output/json.go	internal/output/json.go
provider internal/provider/bitbucket.go	internal/provider/bitbucket.go
sbom
secrets
ui Package ui provides progress display for repository analysis.	Package ui provides progress display for repository analysis.
worker internal/worker/pool.go	internal/worker/pool.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL