codemium

module
v0.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 2, 2026 License: MIT

README

Codemium

CI Release Go Report Card

Generate code statistics across all repositories in a Bitbucket Cloud workspace, GitHub organization, GitHub user account, or GitLab group. Produces per-repo and aggregate metrics including lines of code, comments, blanks, and cyclomatic complexity for 200+ languages.

Features

  • Analyze all repos in a Bitbucket workspace, GitHub organization, GitHub user account, or GitLab group
  • Filter by Bitbucket projects, specific repos, or exclusion lists
  • Per-language breakdown: files, code lines, comments, blanks, complexity
  • Automatic vendor/generated/binary file filtering for accurate metrics (powered by go-enry)
  • Per-repo license detection with SPDX identifiers (e.g., MIT, Apache-2.0)
  • Code churn and hotspot analysis: find files that change most often and are most complex
  • JSON output to file (default: output/report.json) and optional markdown summary
  • Parallel processing with configurable concurrency
  • Progress bar in terminal, plain text fallback in CI/CD
  • AI code estimation: detect AI-assisted commits via co-author tags, message patterns, and bot authors
  • Pure Go, no external dependencies at runtime (no git or scc binary needed)

Installation

Homebrew
brew install dsablic/tap/codemium
Pre-built binaries

Download from the releases page.

From source
go install github.com/dsablic/codemium/cmd/codemium@latest

Authentication

Codemium supports interactive API token login and environment variable tokens.

Bitbucket

Option 1: API token (interactive)

  1. Create a scoped API token at https://id.atlassian.com/manage-profile/security/api-tokens — click "Create API token with scopes", select Bitbucket as the app, and enable Repository Read
  2. Run:
    codemium auth login --provider bitbucket
    
    This prompts for your Atlassian email and API token. Credentials are verified against the Bitbucket API and stored at ~/.config/codemium/credentials.json.

Option 2: Environment variable (CI/CD)

export CODEMIUM_BITBUCKET_USERNAME=your_email
export CODEMIUM_BITBUCKET_TOKEN=your_api_token
GitHub

Option 1: gh CLI (recommended)

If you already have the GitHub CLI installed and authenticated, codemium uses its token automatically — no extra setup needed:

# If not already authenticated:
gh auth login

# Then just run codemium directly:
codemium analyze --provider github --org myorg

You can also explicitly save the token to codemium's credential store:

codemium auth login --provider github

Option 2: OAuth device flow

If you have a GitHub OAuth App, you can use the device flow instead:

export CODEMIUM_GITHUB_CLIENT_ID=your_client_id
codemium auth login --provider github

This displays a code to enter at github.com/login/device.

Option 3: Environment variable (CI/CD)

export CODEMIUM_GITHUB_TOKEN=your_personal_access_token

Resolution order: CODEMIUM_GITHUB_TOKEN env var > saved credentials > gh auth token CLI.

GitLab

Option 1: Personal access token (interactive)

  1. Create a personal access token at https://gitlab.com/-/user_settings/personal_access_tokens with the read_api scope
  2. Run:
    codemium auth login --provider gitlab
    
    This prompts for your token, verifies it against the GitLab API, and stores it at ~/.config/codemium/credentials.json.

Option 2: glab CLI

If you have the GitLab CLI installed and authenticated, codemium can use its token automatically:

glab auth login
codemium analyze --provider gitlab --group mygroup

Option 3: Environment variable (CI/CD)

export CODEMIUM_GITLAB_TOKEN=your_personal_access_token

Resolution order: CODEMIUM_GITLAB_TOKEN env var > saved credentials > glab config get token CLI.

Usage

Analyze a Bitbucket workspace
# All repos in a workspace
codemium analyze --provider bitbucket --workspace myworkspace

# Filter by Bitbucket projects
codemium analyze --provider bitbucket --workspace myworkspace --projects PROJ1,PROJ2

# Specific repos only
codemium analyze --provider bitbucket --workspace myworkspace --repos repo1,repo2

# Exclude repos
codemium analyze --provider bitbucket --workspace myworkspace --exclude old-repo,deprecated-repo
Analyze a GitHub organization
# All repos in an org
codemium analyze --provider github --org myorg

# Specific repos
codemium analyze --provider github --org myorg --repos api,frontend
Analyze a GitHub user's repos
# All repos for a user (includes private repos the token has access to)
codemium analyze --provider github --user myuser

# Specific repos
codemium analyze --provider github --user myuser --repos repo1,repo2
Analyze a GitLab group
# All repos in a group (includes subgroups)
codemium analyze --provider gitlab --group mygroup

# Nested group
codemium analyze --provider gitlab --group myorg/mysubgroup

# Specific repos
codemium analyze --provider gitlab --group mygroup --repos api,frontend

The trends command analyzes repositories at historical points in time using git history, showing how codebases evolve over configurable intervals.

# Monthly trends for the past year
codemium trends --provider github --org myorg --since 2025-03 --until 2026-02

# Weekly trends
codemium trends --provider github --org myorg --since 2025-01-01 --until 2025-03-01 --interval weekly

# Output to file, then convert to markdown
codemium trends --provider github --org myorg --since 2025-01 --until 2025-12 --output trends.json
codemium markdown trends.json > trends.md

Note: For Bitbucket, trends requires OAuth credentials (not API tokens), since it needs to clone full git history. Set CODEMIUM_BITBUCKET_CLIENT_ID and CODEMIUM_BITBUCKET_CLIENT_SECRET, then run codemium auth login --provider bitbucket.

Output options
# JSON to default file (output/report.json)
codemium analyze --provider github --org myorg

# JSON to custom file
codemium analyze --provider github --org myorg --output report.json

# Markdown summary
codemium analyze --provider github --org myorg --markdown report.md

# Both
codemium analyze --provider github --org myorg --output report.json --markdown report.md
AI narrative analysis

Generate a rich narrative analysis of your codebase using an AI CLI:

# Auto-detect AI CLI (tries claude, codex, gemini in order)
codemium markdown --narrative report.json

# Use a specific AI CLI
codemium markdown --narrative --ai-cli gemini report.json

# Add custom instructions
codemium markdown --narrative --ai-prompt "Focus on test coverage gaps" report.json

# Load instructions from file
codemium markdown --narrative --ai-prompt-file analysis-prompt.txt report.json

# Works with trends reports too
codemium markdown --narrative trends.json

Requires one of: Claude Code, Codex CLI, or Gemini CLI installed and authenticated.

Providing context for better narratives: The AI generates richer analysis when given domain context about your organization. Use --ai-prompt or --ai-prompt-file to describe project areas, team structure, or what specific repos contain:

# Inline context
codemium markdown --narrative --ai-prompt 'Project codes map to these areas:
- SVC = Backend Services
- WEB = Customer-Facing Web Apps
- MOB = Mobile Apps (iOS & Android)
- PLAT = Platform & Infrastructure
- SDK = Public SDKs and Client Libraries

The SVC repos include both microservices and shared libraries.
The PLAT team also maintains CI/CD pipelines.' report.json

# Or load from a file for longer descriptions
codemium markdown --narrative --ai-prompt-file org-context.txt report.json

This is especially useful when Bitbucket project codes or repo naming conventions aren't self-explanatory — the AI will use your descriptions to assign human-readable names and provide more insightful analysis.

Repository health classification

Classify repositories as Active, Maintained, Abandoned, or Failed based on commit history:

# Quick health check (1 API call per repo, no cloning)
codemium analyze --provider github --org myorg --health

# Deep health analysis with author counts, churn, and velocity per window
codemium analyze --provider github --org myorg --health-details

# Limit commits scanned for deep analysis (default: 500)
codemium analyze --provider github --org myorg --health-details --health-commit-limit 200

Health categories:

  • Active: last commit < 180 days ago
  • Maintained: 180–365 days ago
  • Abandoned: > 365 days ago
  • Failed: commit history could not be fetched (API error, permissions, etc.)
Persistent cloning

By default, repos are cloned to a temp directory and deleted after analysis. Use --clone to keep them:

# Clone repos to ./repos/<repo-slug>/ and keep them after analysis
codemium analyze --provider github --org myorg --clone ./repos

# Subsequent runs reuse existing clones (no re-download)
codemium analyze --provider github --org myorg --clone ./repos
Secret scanning

Scan repositories for leaked secrets (API keys, tokens, passwords) using gitleaks:

codemium analyze --provider github --org myorg --secrets

Results show per-repo finding counts and which files contain secrets (actual secret values are never included in the report).

Dependency inventory (SBOM)

Generate a software bill of materials for each repository:

codemium analyze --provider github --org myorg --sbom

The report includes per-repo dependency counts with ecosystem breakdown (e.g. go-module, npm, pip).

Rate limiting and error logging

API requests that receive a 429 (Too Many Requests) response are automatically retried with exponential backoff (up to 5 retries). Use --rate-limit to proactively throttle requests and avoid hitting rate limits (e.g., --rate-limit 5 for GitLab's 300 req/min raw endpoint limit).

When API errors occur during health classification, AI estimation, or detailed analysis, an error log is automatically written next to the JSON report (e.g., output/report.error.log for output/report.json). Each line is prefixed with a category ([health], [health-details], [ai-estimate], [ai-estimate-detail]) for easy filtering with grep.

Additional flags
--concurrency 10            # Parallel workers (default: 5)
--rate-limit 5              # Max API requests per second (default: unlimited)
--include-archived          # Include archived repos (excluded by default)
--include-forks             # Include forked repos (excluded by default)
--ai-estimate               # Estimate AI-generated code via commit history analysis
--ai-commit-limit 200       # Max commits to scan per repo (default: 200)
--health                    # Classify repos by activity level
--health-details            # Deep health analysis (implies --health)
--health-commit-limit 500   # Max commits for health details (default: 500)
--churn                     # Enable code churn and hotspot analysis
--churn-limit 500           # Max commits to scan per repo for churn (default: 500)
--clone ./repos                 # Persist cloned repos to directory (reuses existing clones)
--secrets                       # Scan repos for secrets (API keys, tokens, passwords)
--sbom                          # Generate dependency inventory (SBOM) per repository

Output Format

JSON
{
  "generated_at": "2026-02-18T12:00:00Z",
  "provider": "github",
  "organization": "myorg",
  "filters": {},
  "repositories": [
    {
      "repository": "my-repo",
      "provider": "github",
      "url": "https://github.com/myorg/my-repo",
      "languages": [
        {
          "name": "Go",
          "files": 42,
          "lines": 5000,
          "code": 3800,
          "comments": 400,
          "blanks": 800,
          "complexity": 120
        }
      ],
      "totals": {
        "files": 42,
        "lines": 5000,
        "code": 3800,
        "comments": 400,
        "blanks": 800,
        "complexity": 120
      }
    }
  ],
  "totals": {
    "repos": 1,
    "files": 42,
    "lines": 5000,
    "code": 3800,
    "comments": 400,
    "blanks": 800,
    "complexity": 120
  },
  "by_language": [
    {
      "name": "Go",
      "files": 42,
      "lines": 5000,
      "code": 3800,
      "comments": 400,
      "blanks": 800,
      "complexity": 120
    }
  ]
}
Markdown

The --markdown flag generates a GitHub-flavored markdown report with:

  • Summary table with aggregate metrics
  • Language breakdown sorted by code lines
  • Per-repository table with links
  • Error section for repos that failed to process

License

MIT License - see LICENSE for details.

Directories

Path Synopsis
cmd
codemium command
internal
analyzer
internal/analyzer/analyzer.go
internal/analyzer/analyzer.go
auth
internal/auth/bitbucket.go
internal/auth/bitbucket.go
health
internal/health/details.go
internal/health/details.go
model
internal/model/model.go
internal/model/model.go
output
internal/output/json.go
internal/output/json.go
provider
internal/provider/bitbucket.go
internal/provider/bitbucket.go
ui
Package ui provides progress display for repository analysis.
Package ui provides progress display for repository analysis.
worker
internal/worker/pool.go
internal/worker/pool.go

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL