CytoScnPy - Roadmap & Development Guide¶

Architecture: Hybrid PyO3 + Standalone CLI Status: Production-ready core, active development

For completed features and implementation history, see GitHub Releases.

Table of Contents¶

Project Status
In Progress
Future Roadmap
Phase 9: Developer Experience
Phase 10: Deep Analysis & Security
Phase 11: Auto-Remediation

Project Status¶

The PyO3-based CLI, bundled cytoscnpy-cli binary, and cytoscnpy-mcp server together deliver the production-ready analysis stack described below: quality, secret, and clone reporting (see docs/CLI.md, docs/usage.md, and README.md) are exercised across platforms, while the VS Code extension and HTML reporting surface that data to editors and browsers.

Core safeguards—quality gates (--fail-on-quality, --fail-threshold), security scanning, --fix/--apply auto-remediation, and stdin/stdout-based MCP hosting—are stable and shipping; the roadmap now focuses on higher-accuracy analysis and better user experience on top of that foundation.

In Progress¶

The sections below highlight the work that is still active:

Phase 5.7 (Radon Parity Gaps) – The parity tests around module-level complexity, else clauses on loops/try, wildcard matching, and Halstead/raw metrics are in place, but the analyzer logic still needs to be implemented (see the ### 5.7 section below).
Phase 6 (Editor Integration) – The VS Code extension and accompanying code audit continue to evolve; Phase 6.1 and 6.2 list the UX, command, and bundling gaps that remain.
Phase 6.x (Editor Client Flag) – Implement --client (currently only vscode). In the future we may need other editor clients, so keep room to extend behavior safely.
Phase 7.6 (Accuracy Improvements) – The benchmark (F1 = 0.72) and the remaining false positives/negatives (34/60 items) are being chipped away in the dedicated Phase 7.6 subsection.

5.7 Radon Parity Gaps IN PROGRESS¶

Status: Tests added, implementation pending. See cytoscnpy/tests/radon_parity_*.rs

These features are tested but not yet implemented. Remove #[ignore] from tests when implementing.

Complexity Gaps (19 tests ignored)¶

Feature	Description	Test File	Radon Behavior
Module-level complexity	Complexity of code outside functions	`radon_parity_complexity_test.rs`	Radon reports module-level `if`/`for`/`while` etc.
For/while else clause	`else:` on loops adds +1 complexity	`radon_parity_complexity_test.rs`	Radon counts loop `else:` as branch
Try-except else clause	`else:` on try adds +1 complexity	`radon_parity_complexity_test.rs`	Radon counts try `else:` as branch
Lambda ternary	Ternary inside lambda adds complexity	`radon_parity_complexity_test.rs`	Ternary in lambda body counts
Ternary with generator	Generator inside ternary	`radon_parity_complexity_test.rs`	Nested comprehension complexity
Match wildcard	`case _:` shouldn't add complexity	`radon_parity_complexity_test.rs`	Wildcard is default, not branch
Nested generator	Inner generator adds complexity	`radon_parity_complexity_test.rs`	Each `for`/`if` in nested generator
Class method `or`	Boolean `or` in condition	`radon_parity_complexity_test.rs`	`or` adds +1 complexity

Halstead Gaps (1 test ignored)¶

Feature	Description	Test File	Radon Behavior
Distinct operand counting	`if a and b: elif b or c:`	`radon_parity_halstead_test.rs`	`b` counted once as distinct

Raw Metrics Gaps (2 tests ignored)¶

Feature	Description	Test File	Radon Behavior
Line continuation with string	Backslash + multiline string	`radon_parity_raw_test.rs`	Continuation counted correctly
Line continuation with comment	Backslash + inline comment	`radon_parity_raw_test.rs`	Comment on continuation line

Implementation Priority¶

Module-level complexity - High impact (8 tests), required for full Radon parity
Loop/try else clauses - Medium impact (5 tests), common pattern
Match wildcard handling - Low impact (2 tests), Python 3.10+ only
Halstead/Raw edge cases - Low impact (3 tests), edge cases

Phase 8: CFG Integration ✅ DONE¶

8.1 Control Flow Graph Construction ✅¶

Implemented CFG construction from Python AST for behavioral analysis:

CFG Builder: Constructs basic blocks from StmtFunctionDef
Control Flow: Handles if, for, while, try, match, break, continue, return, raise
Loop Depth: Tracks nesting depth for each basic block
Fingerprinting: Behavioral signature for clone comparison

8.2 Clone Detection Integration ✅¶

CFG validation as secondary filter for clone detection:

Feature	Description
`cfg_validation` config	Enable/disable CFG validation in `CloneConfig`
`validate_with_cfg()`	Phase 4.5 filter in `CloneDetector::detect()`
`cfg_validated` context	+15 confidence boost in `ConfidenceScorer`
70% similarity threshold	CFG pairs below this are filtered out

8.3 Feature Flag¶

Enabled via --features cfg at compile time:

cargo build --features cfg
cargo test --features cfg

Phase 6: Editor Integration ✅ DONE¶

6.1 VS Code Extension IN PROGRESS¶

6.2 Extension Code Audit (Pending Fixes) IN PROGRESS¶

6.2.3 JSON Parsing Completeness ✅¶

Fields in CLI JSON output not captured by analyzer.ts:

[ ] Add summary stats display in output channel

6.2.4 Missing Commands¶

Command	Description	Status
`cytoscnpy.taintAnalysis`	Run taint analysis specifically	❌

6.2.5 Path Handling ✅¶

[ ] Add macOS (cytoscnpy-cli-darwin) binary bundling
[ ] Add Linux (cytoscnpy-cli-linux) binary bundling

6.2.6 UX Enhancements¶

Feature	Description	Priority	Status
Status Bar	Show finding count in status bar	Medium	❌
Sidebar Badge	Show issue count in Explorer sidebar	Medium	✅
Quick Fixes	Code actions to remove/comment unused code	High	✅
Gutter Decorations	Visual icons for severity levels	Low	✅
Progress Indicator	Show progress during workspace analysis	Medium	❌
File Caching	Skip re-analyzing unchanged files	Low	✅
Problem Grouping	Better categorization in Problems panel	Low	✅

Phase 7: Infrastructure & Quality ✅ DONE¶

7.2 Error Handling ✅¶

7.3 Parser Migration: `rustpython-parser` → `ruff_python_parser` ✅¶

Reference: See RustPython/Cargo.toml for working example.

Phase 7.5: Performance Optimizations ✅ DONE¶

Pending Optimizations (Low Priority)¶

Optimization	Description	Priority	Complexity	Est. Impact
Reduce remaining `.clone()` calls	Audit and eliminate unnecessary clones (~50+ remaining)	Low	Low	2-5%
String interning	Use `string-cache` crate for repeated strings (module names, type names)	Low	Medium	3-5%
Profile-Guided Optimization (PGO)	Build with profile data for 5-10% improvement	Low	Medium	5-10%
Parallel AST traversal	Split large files for parallel statement processing	Very Low	High	10-20% on large files

❌ Not Needed / Deferred¶

Optimization	Reason
Scope resolution caching	Already optimized via `cached_scope_prefix`
Arc\<String> for module_name	Would require significant refactoring, marginal benefit
FxHashSet audit	Already using `FxHashSet` in most places

Phase 7.6: Accuracy Improvements IN PROGRESS¶

Systematic improvements to detection accuracy based on benchmark analysis.

Current Status: F1 = 0.72 (110 TP, 46 FP, 38 FN)

7.6.1 Completed Fixes ✅¶

[x] Framework Decorator Tracking: Accurate detection for FastAPI, Django, and Celery entry points.
[x] TYPE_CHECKING Block Handling: Correctly ignores imports used only in type-check blocks.
[x] F-string Reference Detection: Tracking variables and functions referenced within f-string interpolations.
[x] Multi-line String LOC: Improved metrics for backslash-continued strings and comments.

7.6.2 Remaining False Positives (34 items)¶

Items incorrectly flagged as unused.

Category	Count	Issue	Priority	Fix Difficulty
Functions	17	Closures, returned functions, pattern matching bindings	High	Medium
Imports	6	Cross-file `__all__` re-exports, FastAPI `Depends`	High	Medium
Variables	6	Closure captures, complex scoping	Medium	Hard
Methods	3	Pydantic `from_dict`/`to_dict` patterns	Low	Easy
Classes	2	FastAPI response models (`In`, `Out`)	Low	Easy

Priority 1: Cross-File `all` Tracking¶

Problem: Imports re-exported via __all__ in other modules are flagged:

# module_a.py
from module_b import ExportedClass  # Flagged as unused

# module_b.py
__all__ = ["ExportedClass"]  # Should mark as used across files

Solution: Track __all__ exports globally and match against imports in other files.

Priority 2: Pattern Matching Bindings¶

Problem: Variables bound in match statements are flagged:

match command:
    case (action, value):  # 'action' and 'value' flagged as unused
        handle(action, value)

Solution: Track match case bindings as references.

Priority 3: Returned Inner Functions¶

Problem: Functions returned from factory functions are flagged:

def factory():
    def inner():  # Flagged as unused
        pass
    return inner  # Should mark 'inner' as used

Status: ✅ Fixed in return statement tracking improvements.

7.6.3 Remaining False Negatives (60 items)¶

Genuinely unused items we fail to detect.

Category	Count	Issue	Priority	Fix Difficulty
Functions	19	Pragma-ignored, security examples, FastAPI deps	Low	N/A (intentional)
Variables	18	Complex scoping, pattern matching, class attrs	High	Medium
Imports	12	Various tracking gaps	Medium	Medium
Methods	10	Methods inside unused classes not linked	High	Medium
Classes	1	Complex inheritance patterns	Low	Hard

Priority 1: Class-Method Linking ✅ DONE¶

Problem: Methods inside unused classes are not detected.

class UnusedClass:  # Detected as unused ✅
    def method(self):  # NOW detected via cascading detection ✅
        pass

Solution: When a class is unused, automatically mark all its methods as unused (cascading deadness).

Implementation: Modified aggregate_results() and analyze_code() in processing.rs to flag all methods within unused classes. Respects heuristic protections (visitor pattern methods are excluded).

Priority 2: Variable Scope Improvements¶

Problem: Local variables in complex scopes are missed.

def func():
    x = 1  # Never used after assignment - should be flagged
    y = process()
    return y

Solution: Improve variable liveness analysis within function scopes.

Priority 3: Import Detection Gaps¶

Problem: Some import patterns not detected:

Imports in type annotations without string quotes
Imports used only in comprehensions
Star imports (from x import *)

7.6.4 Accuracy Improvement Roadmap¶

Phase	Target F1	Key Fixes	Status
7.6.1	0.63	Return annotations, TYPE_CHECKING	✅ Done
7.6.2	0.68	Cross-file `__all__`, pattern matching	PLANNED
7.6.3	0.72	Class-method linking, variable scopes	PLANNED
7.6.4	0.75	Import gaps, framework patterns	PLANNED

Phase 8: Advanced Framework Support ✅ DONE¶

Django, FastAPI, Pydantic is done ✅.

Future Roadmap¶

Phase 9: Developer Experience¶

Tools to improve the workflow around CytoScnPy.

[x] Git Hooks (pre-commit) ✅
Automated analysis on commit/push.
See docs/pre-commit.md for setup instructions.
[x] CI/CD Integration Examples ✅
Reference workflows for GitHub Actions provided in .github/workflows/.
Supports --fail-on-quality and --fail-threshold for gatekeeping.
[x] uv Package Manager Integration ✅
Full support for uv-managed environments.
Used in official lint/CI workflows.
[ ] MCP HTTP/SSE Transport
Add HTTP/SSE transport for remote LLM integrations (web-based clients, APIs).
Challenges to Address:
- Path validation/sandboxing for security
- Timeout handling for large project analysis (30-60s)
Remote Analysis Tools: | Tool | Input | Use Case | |------|-------|----------| | analyze_code | Code string | Small snippets (already works) | | analyze_files | JSON map of files | Medium projects via upload | | analyze_repo | Git URL | Clone & analyze public repos | | analyze_path | Local path | Server-local files only |
Implementation:
- Add --http --port 3000 CLI flags for transport selection
- Use rmcp SSE transport feature
- Add Git clone support for analyze_repo tool
[ ] LSP Server (Language Server Protocol)
Implement a real-time LSP server for VS Code, Neovim, and Zed.
Provide instant diagnostics without saving or running CLI.
[ ] Config File Support for Notebook Options
Allow include_ipynb and ipynb_cells in .cytoscnpy.toml and pyproject.toml
Currently these are CLI-only flags (--include-ipynb, --ipynb-cells)
Rationale: Enable persistent configuration without passing flags on every run
Implementation: Add fields to CytoScnPyConfig struct in src/config.rs
[ ] Git Integration
Blame Analysis: Identify who introduced unused code.
Incremental Analysis: Analyze only files changed in the current PR/commit.
[x] HTML Report Generation ✅
Generate self-contained HTML reports for large codebase analysis.
Features:
- Syntax highlighting (using highlight.js or prism.js)
- Clickable file links with line numbers
- Filtering by type (unused, security, quality), severity, file
- Search across all findings
- Summary dashboard with charts
- Code snippets showing context around each finding (Basic impl done, see Phase 9.5 for improvements)

CLI:

cytoscnpy ./project --html
# Multi-file support planned for large projects

Implementation:
- Use tera or askama for templating
- Embed CSS/JS for self-contained output
- Optional: Split large reports into multiple HTML files with index
[x] Security Documentation Overhaul ✅
Categorized all 50+ danger rules into logical modules (Code Execution, Injection, etc.).
Ensured 1:1 parity between documentation and Rust implementation (severities, patterns).
Added safer alternatives and remediation advice for all rules.
See Dangerous Code Rules for details.
[ ] Live Server Mode
Built-in HTTP server to browse analysis results interactively.
Features:
- Auto-refresh on file changes (watch mode)
- REST API for findings (JSON endpoints)
- Interactive code browser with inline annotations
- Severity/type filters with live updates
CLI:
```
cytoscnpy serve ./project --port 8080
```
Technical Approach:
- Use axum or warp for lightweight HTTP server
- WebSocket for live updates
- Serve static HTML + JSON API
Use Cases:
- Team code review sessions
- CI/CD dashboard integration
- Local development feedback loop
Smoke-test reference: cytoscnpy-mcp/scripts/test_mcp_server.py drives cytoscnpy mcp-server over JSON-RPC (initialize, tools/list, analyze_code) so you can validate the CLI-hosted MCP transport before wiring it into downstream clients.

Phase 9.5: Report Actionability Upgrade PLANNED¶

Implementing findings from the Recommendation System Audit.

Goal: Transform the report from a diagnostic tool into a remediation platform.

[ ] Remediation Display Engine (Priority: HIGH)
Problem: Backend has remediation data (e.g., "Use parameterized queries"), but it's lost during report generation.
Solution:
- Extend IssueItem struct with remediation and vuln_type fields.
- Update flatten_issues to preserve SinkInfo remediation strings.
- Update issues.html and file_view.html to display a collapsible "Remediation" box.
[ ] Context-Aware Code Snippets (Priority: MEDIUM)
Problem: Issues are shown as one-liners without context.
Solution:
- Extract 3-5 lines of code around the issue location.
- Display syntax-highlighted snippets inline in the Issues tab.
[ ] Enriched Quality Messages (Priority: MEDIUM)
Problem: Generic messages like "Function too complex" offer no guidance.
Solution: Map rule IDs to specific refactoring advice (e.g., "Extract reusable logic into helper functions").
[ ] Prioritization Framework (Priority: LOW)
Problem: All high-severity issues look the same.
Solution: Add "Exploitability" and "Fix Effort" scores to help teams prioritize.

Benchmarking Infrastructure Ideas¶

Component	Description	Tools/Approaches
Containerized Execution	Isolated, reproducible environments	Docker, Podman
Cross-Platform Matrix	Test on Windows, Linux, macOS	GitHub Actions matrix
Python Version Matrix	Test with Python 3.8-3.13	tox, nox
Memory Profiling	Track peak RSS, allocations	tracemalloc, memory_profiler
CPU Profiling	Identify bottlenecks	py-spy, cProfile
Differential Testing	Compare outputs between tool versions	Custom diff scripts
Regression Testing	Detect accuracy/performance regressions	Baseline JSON comparison

Suggested Future Improvements¶

Expand Ground Truth: Add more test cases for edge cases (decorators, type hints, async code)
Real-World Validation: Run on popular open-source projects (Django, Flask, requests)
Add MCC Metric: Better handles imbalanced detection categories
Per-File Breakdown: Show which specific test files each tool struggles with
Confidence Threshold Sweep: Test Vulture at multiple confidence levels (0%, 30%, 60%, 90%)
Cross-Language Comparison: Compare Python tools with similar tools for other languages

Phase 10: Deep Analysis & Security¶

Pushing the boundaries of static analysis.

[x] Secret Scanning 2.0
Enhance regex scanning with entropy analysis to reduce false positives for API keys.
[x] AST-Based Suspicious Variable Detection (Secret Scanning 3.0) ✅
AstRecognizer now walks assignments, annotations, attributes, and dict subscripts, emits CSP-S300 when suspicious names (password, secret, key, token, etc.) are assigned literal strings, and skips env vars, placeholders, and test files while lowering confidence (cytoscnpy/src/rules/secrets/recognizers.rs:296-438).
[x] Modular Secret Recognition Engine (Secret Scanning 4.0) ✅
SecretScanner orchestrates RegexRecognizer, AstRecognizer, EntropyRecognizer, and optional CustomRecognizer, feeds their raw matches to the ContextScorer, deduplicates findings by line, and filters by SecretsConfig thresholds like min_score, entropy_threshold, and suspicious_names (cytoscnpy/src/rules/secrets/mod.rs:1-138, cytoscnpy/src/config.rs:88-157, cytoscnpy/src/rules/secrets/scoring/mod.rs:1-147).
ContextScorer applies bonuses/penalties for keywords, entropy, comments, docstrings, placeholders, and env-var patterns before clamping to 0-100, so the modular engine already enforces the scoring rules described in the previous plan (cytoscnpy/src/rules/secrets/scoring/mod.rs:1-147).
[ ] Dependency Graph IN PROGRESS
Generate DOT/Mermaid graphs of module dependencies to aid refactoring.
Core CallGraph infrastructure implemented in cytoscnpy/src/taint/call_graph.rs.
[ ] License Compliance
Scan requirements.txt and Cargo.toml for incompatible licenses.

Phase 11: Auto-Remediation ✅ DONE¶

Safe, automated code fixes.

[x] Safe Code Removal (--fix)
Implementation: Use AST byte ranges from ruff_python_parser for precise removal.
Features:
- --fix flag removes unused functions, classes, and imports
- --dry-run previews changes without applying
- CST mode (tree-sitter) is now enabled by default for better comment preservation
- Only high-confidence items (≥90%) are auto-fixed
- Cascading detection: methods inside unused classes are auto-removed with their parent class

Phase 12: Security & Lifecycle¶

[ ] Fuzzing Environment Stabilization
This allows reliable cargo fuzz execution to catch edge-case crashes and undefined behavior.
Implementation: Add a fuzz-linux.yml workflow that runs in Ubuntu and uses cargo +nightly fuzz.

Phase 13: Interprocedural Taint Analysis¶

Deep data-flow analysis across function boundaries.

[ ] Global Call Graph Construction IN PROGRESS
Map function calls across the entire project to track how data moves between modules.
Necessary for tracking "taint" from a source in one file to a sink in another.
Status: cytoscnpy/src/taint/call_graph.rs already builds nodes, callee/caller edges, and qualifier handling; remaining work is propagating taint/sanitization through that graph.
[ ] Cross-Function Taint Tracking
Store and propagate "taint state" for function arguments and return values.
Goal: Catch vulnerabilities like an API request being passed through a helper function into an eval() or SQL query.
[ ] Sanitization Recognition
Detect when tainted data passes through "safe" functions (like html.escape() or custom sanitizers).
Benefit: Significantly reduces False Positives by knowing when data is no longer dangerous.
[ ] Framework-Specific Entry Points
Add deep support for FastAPI dependencies, Django middleware, and Flask request hooks.
Benefit: Provides "Premium" level security coverage for modern Python web applications.

135 total ground truth items, 11 tools benchmarked