From Vulnerability to Patch: Building an Autonomous SAST Agent

2025-04-15•Security, SAST, AI, Python, DevSecOps

Static Application Security Testing (SAST) tools find vulnerabilities. But then what? Someone has to read the report, understand the issue, write the fix, test it, and create a PR. That process takes hours or days per vulnerability.

I built a Python SAST Agent that handles the entire pipeline autonomously: scan → analyze → fix → PR. Here's how.

The Problem with Traditional SAST

Traditional SAST workflows have a massive gap between detection and remediation:

Traditional SAST Pipeline:
  Code Push ──▶ SAST Scan ──▶ Report (100+ findings)
                                     │
                              ┌──────┴──────┐
                              │  Manual      │
                              │  Triage      │  ← Days/Weeks
                              │  + Fix       │
                              │  + PR        │
                              └──────┬──────┘
                                     │
                              Fixed Code (maybe)

The reality:

SAST tools generate 100+ findings per scan
60-70% are false positives
Developers ignore reports because of noise
Security debt grows indefinitely

My Solution: The Autonomous SAST Agent

Autonomous SAST Agent:
  Code Push ──▶ Clone Repo ──▶ Run Scanners ──▶ AI Analysis
                                                     │
                                              ┌──────┴──────┐
                                              │ For each     │
                                              │ true positive│
                                              │              │
                                              │ Generate Fix │
                                              │ Run Tests    │
                                              │ Create PR    │ ← Minutes
                                              └──────┬──────┘
                                                     │
                                              PRs with fixes
                                              + Slack alert

Architecture

The agent has 5 distinct stages:

Stage 1: Repository Cloning

import subprocess
import tempfile

def clone_repository(repo_url: str, branch: str = "main") -> str:
    """Clone the target repository to a temp directory."""
    work_dir = tempfile.mkdtemp(prefix="sast_")

    subprocess.run(
        ["git", "clone", "--depth=1", "--branch", branch, repo_url, work_dir],
        check=True,
        capture_output=True
    )

    return work_dir

Stage 2: Multi-Scanner Execution

I run multiple scanners for comprehensive coverage:

import json

def run_bandit(work_dir: str) -> list[dict]:
    """Run Bandit for Python security analysis."""
    result = subprocess.run(
        ["bandit", "-r", work_dir, "-f", "json", "--severity-level", "medium"],
        capture_output=True, text=True
    )

    findings = json.loads(result.stdout)
    return [
        {
            "tool": "bandit",
            "file": r["filename"],
            "line": r["line_number"],
            "severity": r["issue_severity"],
            "confidence": r["issue_confidence"],
            "issue": r["issue_text"],
            "code": r["code"],
            "cwe": r.get("issue_cwe", {}).get("id"),
            "test_id": r["test_id"]
        }
        for r in findings.get("results", [])
    ]

def run_semgrep(work_dir: str) -> list[dict]:
    """Run Semgrep for pattern-based analysis."""
    result = subprocess.run(
        ["semgrep", "--config=auto", "--json", work_dir],
        capture_output=True, text=True
    )

    output = json.loads(result.stdout)
    return [
        {
            "tool": "semgrep",
            "file": r["path"],
            "line": r["start"]["line"],
            "severity": r["extra"]["severity"],
            "issue": r["extra"]["message"],
            "code": r["extra"].get("lines", ""),
            "rule_id": r["check_id"]
        }
        for r in output.get("results", [])
    ]

def run_all_scanners(work_dir: str) -> list[dict]:
    """Run all SAST scanners and merge results."""
    findings = []
    findings.extend(run_bandit(work_dir))
    findings.extend(run_semgrep(work_dir))
    return deduplicate_findings(findings)

Stage 3: AI-Powered Triage

This is where AI eliminates false positives:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0)

TRIAGE_PROMPT = """
You are a senior security engineer. Analyze this SAST finding and determine
if it is a TRUE POSITIVE or FALSE POSITIVE.

Finding:
- Tool: {tool}
- Issue: {issue}
- Severity: {severity}
- File: {file}
- Line: {line}
- Code Context: (provided below)

Consider:
1. Is this actually exploitable in this context?
2. Are there already mitigations in the surrounding code?
3. Is this a testing/development file that wouldn't be in production?

Respond with a JSON object containing: verdict (true_positive or false_positive),
confidence (0.0-1.0), reasoning, and exploitability.
"""

async def triage_finding(finding: dict) -> dict:
    """Use AI to determine if a finding is a true positive."""
    context = read_file_context(finding["file"], finding["line"], window=20)

    prompt = TRIAGE_PROMPT.format(**finding) + f"\nCode:\n{context}"
    response = await llm.ainvoke(prompt)

    result = json.loads(response.content)
    finding["triage"] = result
    return finding

async def triage_all(findings: list[dict]) -> list[dict]:
    """Triage all findings, keeping only true positives."""
    triaged = await asyncio.gather(*[triage_finding(f) for f in findings])
    return [
        f for f in triaged
        if f["triage"]["verdict"] == "true_positive"
        and f["triage"]["confidence"] >= 0.75
    ]

Stage 4: Automated Fix Generation

For each true positive, generate a code fix:

FIX_PROMPT = """
You are a senior Python developer fixing a security vulnerability.

Vulnerability: {issue}
Severity: {severity}
CWE: {cwe}

Current vulnerable code and full file content will be provided below.

Requirements:
1. Fix the security issue while preserving functionality
2. Follow Python best practices and PEP 8
3. Add a comment explaining the security fix
4. Do NOT change unrelated code

Return ONLY the fixed version of the affected function/block.
"""

async def generate_fix(finding: dict) -> dict:
    """Generate a code fix for a vulnerability."""
    file_content = open(finding["file"]).read()
    vulnerable_code = extract_vulnerable_block(file_content, finding["line"])

    response = await llm.ainvoke(
        FIX_PROMPT.format(
            issue=finding["issue"],
            severity=finding["severity"],
            cwe=finding.get("cwe", "N/A"),
            vulnerable_code=vulnerable_code,
            file_content=file_content
        )
    )

    finding["fix"] = {
        "original": vulnerable_code,
        "patched": response.content,
        "file": finding["file"],
        "line": finding["line"]
    }
    return finding

Stage 5: PR Creation with GitHub API

import httpx
from datetime import datetime

async def create_fix_pr(
    repo: str,
    findings: list[dict],
    github_token: str
) -> str:
    """Create a PR with all security fixes."""
    branch_name = f"security-fix/{datetime.now().strftime('%Y%m%d-%H%M%S')}"

    headers = {
        "Authorization": f"Bearer {github_token}",
        "Accept": "application/vnd.github.v3+json"
    }

    async with httpx.AsyncClient() as client:
        # 1. Create branch
        main_ref = await get_main_ref(client, repo, headers)
        await create_branch(client, repo, branch_name, main_ref, headers)

        # 2. Apply fixes to files
        for finding in findings:
            fix = finding["fix"]
            await update_file(
                client, repo, branch_name,
                fix["file"], fix["patched"],
                f"fix: {finding['issue'][:50]}",
                headers
            )

        # 3. Create PR
        pr_body = generate_pr_description(findings)
        pr = await client.post(
            f"https://api.github.com/repos/{repo}/pulls",
            headers=headers,
            json={
                "title": f"🔒 Security: Fix {len(findings)} vulnerabilities",
                "body": pr_body,
                "head": branch_name,
                "base": "main"
            }
        )

        return pr.json()["html_url"]

def generate_pr_description(findings: list[dict]) -> str:
    """Generate a detailed PR description."""
    lines = ["## 🔒 Automated Security Fixes\n"]
    lines.append("This PR was automatically generated by the SAST Agent.\n")
    lines.append("### Vulnerabilities Fixed\n")
    lines.append("| # | Severity | Issue | File | Line |")
    lines.append("|---|----------|-------|------|------|")

    for i, f in enumerate(findings, 1):
        lines.append(
            f"| {i} | {f['severity']} | {f['issue'][:60]} | "
            f"`{f['file'].split('/')[-1]}` | L{f['line']} |"
        )

    lines.append("\n### Review Notes")
    lines.append("- Each fix has been AI-generated and should be reviewed")
    lines.append("- Run the test suite before merging")
    lines.append("- Original vulnerability details are in inline comments")

    return "\n".join(lines)

Results

After deploying this agent across 5 internal repositories:

Metric	Value
Findings per scan (raw)	~120
True positives after triage	~15 (87% noise reduction)
Auto-fix success rate	78% (fixes that pass tests)
Average scan-to-PR time	8 minutes
Vulnerabilities fixed per month	45+

Lessons Learned

AI triage is the killer feature: Reducing noise from 120 to 15 findings made the tool actually useful.
Never auto-merge: Always create a PR for human review. AI-generated code fixes need human validation.
Run tests after fixing: If the fix breaks tests, flag it for manual review instead of creating a broken PR.
Start with high-confidence fixes: SQL injection and hardcoded secrets are easy to fix automatically. Complex logic vulnerabilities need human judgment.

The full SAST Agent source code is available on GitHub.