From Vulnerability to Patch: Building an Autonomous SAST Agent
Static Application Security Testing (SAST) tools find vulnerabilities. But then what? Someone has to read the report, understand the issue, write the fix, test it, and create a PR. That process takes hours or days per vulnerability.
I built a Python SAST Agent that handles the entire pipeline autonomously: scan → analyze → fix → PR. Here's how.
The Problem with Traditional SAST
Traditional SAST workflows have a massive gap between detection and remediation:
Traditional SAST Pipeline:
Code Push ──▶ SAST Scan ──▶ Report (100+ findings)
│
┌──────┴──────┐
│ Manual │
│ Triage │ ← Days/Weeks
│ + Fix │
│ + PR │
└──────┬──────┘
│
Fixed Code (maybe)
The reality:
- SAST tools generate 100+ findings per scan
- 60-70% are false positives
- Developers ignore reports because of noise
- Security debt grows indefinitely
My Solution: The Autonomous SAST Agent
Autonomous SAST Agent:
Code Push ──▶ Clone Repo ──▶ Run Scanners ──▶ AI Analysis
│
┌──────┴──────┐
│ For each │
│ true positive│
│ │
│ Generate Fix │
│ Run Tests │
│ Create PR │ ← Minutes
└──────┬──────┘
│
PRs with fixes
+ Slack alert
Architecture
The agent has 5 distinct stages:
Stage 1: Repository Cloning
import subprocess
import tempfile
def clone_repository(repo_url: str, branch: str = "main") -> str:
"""Clone the target repository to a temp directory."""
work_dir = tempfile.mkdtemp(prefix="sast_")
subprocess.run(
["git", "clone", "--depth=1", "--branch", branch, repo_url, work_dir],
check=True,
capture_output=True
)
return work_dir
Stage 2: Multi-Scanner Execution
I run multiple scanners for comprehensive coverage:
import json
def run_bandit(work_dir: str) -> list[dict]:
"""Run Bandit for Python security analysis."""
result = subprocess.run(
["bandit", "-r", work_dir, "-f", "json", "--severity-level", "medium"],
capture_output=True, text=True
)
findings = json.loads(result.stdout)
return [
{
"tool": "bandit",
"file": r["filename"],
"line": r["line_number"],
"severity": r["issue_severity"],
"confidence": r["issue_confidence"],
"issue": r["issue_text"],
"code": r["code"],
"cwe": r.get("issue_cwe", {}).get("id"),
"test_id": r["test_id"]
}
for r in findings.get("results", [])
]
def run_semgrep(work_dir: str) -> list[dict]:
"""Run Semgrep for pattern-based analysis."""
result = subprocess.run(
["semgrep", "--config=auto", "--json", work_dir],
capture_output=True, text=True
)
output = json.loads(result.stdout)
return [
{
"tool": "semgrep",
"file": r["path"],
"line": r["start"]["line"],
"severity": r["extra"]["severity"],
"issue": r["extra"]["message"],
"code": r["extra"].get("lines", ""),
"rule_id": r["check_id"]
}
for r in output.get("results", [])
]
def run_all_scanners(work_dir: str) -> list[dict]:
"""Run all SAST scanners and merge results."""
findings = []
findings.extend(run_bandit(work_dir))
findings.extend(run_semgrep(work_dir))
return deduplicate_findings(findings)
Stage 3: AI-Powered Triage
This is where AI eliminates false positives:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0)
TRIAGE_PROMPT = """
You are a senior security engineer. Analyze this SAST finding and determine
if it is a TRUE POSITIVE or FALSE POSITIVE.
Finding:
- Tool: {tool}
- Issue: {issue}
- Severity: {severity}
- File: {file}
- Line: {line}
- Code Context: (provided below)
Consider:
1. Is this actually exploitable in this context?
2. Are there already mitigations in the surrounding code?
3. Is this a testing/development file that wouldn't be in production?
Respond with a JSON object containing: verdict (true_positive or false_positive),
confidence (0.0-1.0), reasoning, and exploitability.
"""
async def triage_finding(finding: dict) -> dict:
"""Use AI to determine if a finding is a true positive."""
context = read_file_context(finding["file"], finding["line"], window=20)
prompt = TRIAGE_PROMPT.format(**finding) + f"\nCode:\n{context}"
response = await llm.ainvoke(prompt)
result = json.loads(response.content)
finding["triage"] = result
return finding
async def triage_all(findings: list[dict]) -> list[dict]:
"""Triage all findings, keeping only true positives."""
triaged = await asyncio.gather(*[triage_finding(f) for f in findings])
return [
f for f in triaged
if f["triage"]["verdict"] == "true_positive"
and f["triage"]["confidence"] >= 0.75
]
Stage 4: Automated Fix Generation
For each true positive, generate a code fix:
FIX_PROMPT = """
You are a senior Python developer fixing a security vulnerability.
Vulnerability: {issue}
Severity: {severity}
CWE: {cwe}
Current vulnerable code and full file content will be provided below.
Requirements:
1. Fix the security issue while preserving functionality
2. Follow Python best practices and PEP 8
3. Add a comment explaining the security fix
4. Do NOT change unrelated code
Return ONLY the fixed version of the affected function/block.
"""
async def generate_fix(finding: dict) -> dict:
"""Generate a code fix for a vulnerability."""
file_content = open(finding["file"]).read()
vulnerable_code = extract_vulnerable_block(file_content, finding["line"])
response = await llm.ainvoke(
FIX_PROMPT.format(
issue=finding["issue"],
severity=finding["severity"],
cwe=finding.get("cwe", "N/A"),
vulnerable_code=vulnerable_code,
file_content=file_content
)
)
finding["fix"] = {
"original": vulnerable_code,
"patched": response.content,
"file": finding["file"],
"line": finding["line"]
}
return finding
Stage 5: PR Creation with GitHub API
import httpx
from datetime import datetime
async def create_fix_pr(
repo: str,
findings: list[dict],
github_token: str
) -> str:
"""Create a PR with all security fixes."""
branch_name = f"security-fix/{datetime.now().strftime('%Y%m%d-%H%M%S')}"
headers = {
"Authorization": f"Bearer {github_token}",
"Accept": "application/vnd.github.v3+json"
}
async with httpx.AsyncClient() as client:
# 1. Create branch
main_ref = await get_main_ref(client, repo, headers)
await create_branch(client, repo, branch_name, main_ref, headers)
# 2. Apply fixes to files
for finding in findings:
fix = finding["fix"]
await update_file(
client, repo, branch_name,
fix["file"], fix["patched"],
f"fix: {finding['issue'][:50]}",
headers
)
# 3. Create PR
pr_body = generate_pr_description(findings)
pr = await client.post(
f"https://api.github.com/repos/{repo}/pulls",
headers=headers,
json={
"title": f"🔒 Security: Fix {len(findings)} vulnerabilities",
"body": pr_body,
"head": branch_name,
"base": "main"
}
)
return pr.json()["html_url"]
def generate_pr_description(findings: list[dict]) -> str:
"""Generate a detailed PR description."""
lines = ["## 🔒 Automated Security Fixes\n"]
lines.append("This PR was automatically generated by the SAST Agent.\n")
lines.append("### Vulnerabilities Fixed\n")
lines.append("| # | Severity | Issue | File | Line |")
lines.append("|---|----------|-------|------|------|")
for i, f in enumerate(findings, 1):
lines.append(
f"| {i} | {f['severity']} | {f['issue'][:60]} | "
f"`{f['file'].split('/')[-1]}` | L{f['line']} |"
)
lines.append("\n### Review Notes")
lines.append("- Each fix has been AI-generated and should be reviewed")
lines.append("- Run the test suite before merging")
lines.append("- Original vulnerability details are in inline comments")
return "\n".join(lines)
Results
After deploying this agent across 5 internal repositories:
| Metric | Value |
|---|---|
| Findings per scan (raw) | ~120 |
| True positives after triage | ~15 (87% noise reduction) |
| Auto-fix success rate | 78% (fixes that pass tests) |
| Average scan-to-PR time | 8 minutes |
| Vulnerabilities fixed per month | 45+ |
Lessons Learned
-
AI triage is the killer feature: Reducing noise from 120 to 15 findings made the tool actually useful.
-
Never auto-merge: Always create a PR for human review. AI-generated code fixes need human validation.
-
Run tests after fixing: If the fix breaks tests, flag it for manual review instead of creating a broken PR.
-
Start with high-confidence fixes: SQL injection and hardcoded secrets are easy to fix automatically. Complex logic vulnerabilities need human judgment.
The full SAST Agent source code is available on GitHub.