12 Jun 2024 2 min read

[AI Dev Tools] LLM Security Scanner, Code Efficiency Benchmark, and Repository Analysis

Source: https://github.com/msoedov/agentic_security

Agentic Security: Open-source LLM Vulnerability Scanner

Agentic Security is an open-source tool designed to scan and test Large Language Models (LLMs) for vulnerabilities and potential security risks.

Key Features:

Performs comprehensive fuzzing and employs various attack techniques to test LLMs, including customizable rule sets and agent-based attacks.
Integrates multiple existing security tools and datasets, such as Garak, InspectAI, and llm-adaptive-attacks, to provide a robust testing framework.
Offers LLM API integration and stress testing capabilities, allowing users to test their own LLM implementations.
Provides a user-friendly interface for managing scans and visualizing results.
Allows users to add custom datasets and extend the tool's functionality through CSV files or by implementing new data loaders.
Can be integrated into CI/CD pipelines for automated security checks during development processes.

Source: https://github.com/msoedov/agentic_security

ENAMEL: Efficiency Benchmark for LLM-Generated Code

ENAMEL (EfficeNcy AutoMatic EvaLuator) is a benchmark for evaluating the efficiency of code generated by LLMs, addressing a gap in existing evaluation frameworks that primarily focus on functional correctness.

The benchmark introduces a new efficiency metric called eff@k, generalizing the pass@k metric from correctness to efficiency and handling right-censored execution time.
An unbiased and variance-reduced estimator of eff@k is derived using Rao--Blackwellization, with a numerically stable implementation provided.
ENAMEL employs human expert-designed algorithms and implementations as reference solutions, setting a high standard for efficiency evaluation.
Rigorous evaluation is ensured through human-curated test case generators that filter out incorrect code and differentiate suboptimal algorithms.
An extensive study across 30 popular LLMs using ENAMEL reveals that current models struggle with designing advanced algorithms and implementation optimization, falling short of generating expert-level efficient code.

Tools you can use from the paper:

https://github.com/q-rz/enamel

Source: How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark

RepoUnderstander: Comprehensive Software Repository Analysis for Automated Software Engineering

RepoUnderstander is a novel Automated Software Engineering (ASE) method designed to comprehensively analyze entire software repositories, addressing limitations in existing LLM-based approaches that focus primarily on local code information.

The method condenses critical repository information into a knowledge graph, reducing complexity and enabling a top-down understanding of the codebase.
A Monte Carlo tree search-based exploration strategy empowers agents to navigate and comprehend the entire repository effectively.
Agents are guided to summarize, analyze, and plan using repository-level knowledge, allowing them to dynamically acquire information and generate patches for real-world GitHub issues.
RepoUnderstander addresses challenges such as extremely long code inputs, noisy information, and complex dependency relationships within software systems.
Experimental results show an 18.5% relative improvement on the SWE-bench Lite benchmark compared to SWE-agent, demonstrating the method's effectiveness in ASE tasks.

Tools you can use from the paper:

https://github.com/RepoUnderstander/RepoUnderstander

Source: How to Understand Whole Software Repository?