2 min read

[AI Dev Tools] LLM Security Scanner, Code Efficiency Benchmark, and Repository Analysis

[AI Dev Tools] LLM Security Scanner, Code Efficiency Benchmark, and Repository Analysis
Source: https://github.com/msoedov/agentic_security

Agentic Security: Open-source LLM Vulnerability Scanner

Agentic Security is an open-source tool designed to scan and test Large Language Models (LLMs) for vulnerabilities and potential security risks.

Key Features:
  • Performs comprehensive fuzzing and employs various attack techniques to test LLMs, including customizable rule sets and agent-based attacks.
  • Integrates multiple existing security tools and datasets, such as Garak, InspectAI, and llm-adaptive-attacks, to provide a robust testing framework.
  • Offers LLM API integration and stress testing capabilities, allowing users to test their own LLM implementations.
  • Provides a user-friendly interface for managing scans and visualizing results.
  • Allows users to add custom datasets and extend the tool's functionality through CSV files or by implementing new data loaders.
  • Can be integrated into CI/CD pipelines for automated security checks during development processes.
Source: https://github.com/msoedov/agentic_security

ENAMEL: Efficiency Benchmark for LLM-Generated Code

ENAMEL (EfficeNcy AutoMatic EvaLuator) is a benchmark for evaluating the efficiency of code generated by LLMs, addressing a gap in existing evaluation frameworks that primarily focus on functional correctness.

  • The benchmark introduces a new efficiency metric called eff@k, generalizing the pass@k metric from correctness to efficiency and handling right-censored execution time.
  • An unbiased and variance-reduced estimator of eff@k is derived using Rao--Blackwellization, with a numerically stable implementation provided.
  • ENAMEL employs human expert-designed algorithms and implementations as reference solutions, setting a high standard for efficiency evaluation.
  • Rigorous evaluation is ensured through human-curated test case generators that filter out incorrect code and differentiate suboptimal algorithms.
  • An extensive study across 30 popular LLMs using ENAMEL reveals that current models struggle with designing advanced algorithms and implementation optimization, falling short of generating expert-level efficient code.
Tools you can use from the paper:

Source: How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark

RepoUnderstander: Comprehensive Software Repository Analysis for Automated Software Engineering

RepoUnderstander is a novel Automated Software Engineering (ASE) method designed to comprehensively analyze entire software repositories, addressing limitations in existing LLM-based approaches that focus primarily on local code information.

  • The method condenses critical repository information into a knowledge graph, reducing complexity and enabling a top-down understanding of the codebase.
  • A Monte Carlo tree search-based exploration strategy empowers agents to navigate and comprehend the entire repository effectively.
  • Agents are guided to summarize, analyze, and plan using repository-level knowledge, allowing them to dynamically acquire information and generate patches for real-world GitHub issues.
  • RepoUnderstander addresses challenges such as extremely long code inputs, noisy information, and complex dependency relationships within software systems.
  • Experimental results show an 18.5% relative improvement on the SWE-bench Lite benchmark compared to SWE-agent, demonstrating the method's effectiveness in ASE tasks.
Tools you can use from the paper:

Source: How to Understand Whole Software Repository?