17 Jun 2024 3 min read

[AI Dev Tools] Chat with your dirs, Dependency Management, and Self varification

Source: https://arxiv.org/pdf/2405.12641v1

Dir-assistant: Chat with Directory Files using Local or API LLMs

Dir-assistant is a tool that enables chatting with files in your current directory using local or API-based LLMs, featuring CGRAG (Contextually Guided Retrieval-Augmented Generation) for improved accuracy.

Key Features:

Supports local LLMs via llama-cpp-python and API LLMs through LiteLLM, with platform support for various CPU and GPU architectures.
Implements file watching to automatically update the index when files change, eliminating the need for manual restarts.
Utilizes a RAG system with embedding models to identify relevant files for LLM processing.
Offers configuration options for both local and API LLMs, allowing customization of model parameters and API settings.
Provides file ignoring capabilities through command-line arguments and a global ignore list in the configuration file.
Example use: Navigate to a directory and run "dir-assistant" to start chatting with the files in that location.
Example use: Ignore specific files or directories by running "dir-assistant --ignore some-project-directory .git .gitignore".

Source: https://github.com/curvedinf/dir-assistant

DepsRAG: Managing Software Dependencies with LLMs

DepsRAG is a proof-of-concept approach that uses Retrieval Augmented Generation (RAG) to manage software dependencies across four popular ecosystems.

The system constructs a Knowledge Graph (KG) of direct and transitive dependencies for software packages.
It answers user questions about dependencies by generating queries to retrieve information from the KG and augmenting LLM inputs with this data.
Web search capability is included to address questions beyond the KG's scope.
DepsRAG aims to simplify the complex task of understanding dependencies and revealing hidden properties such as dependency chains and depth.
While offering tangible benefits, the approach also has limitations that are acknowledged by the developers.

Tools you can use from the paper:

No implementation tools or repository links are provided.

Source: DepsRAG: Towards Managing Software Dependencies using Large Language Models

FunCoder: Recursive Function Decomposition for Complex Code Generation

FunCoder is a code generation framework that uses recursive function decomposition and consensus-based evaluation to improve performance on complex programming tasks.

The framework employs a divide-and-conquer strategy, breaking down complex requirements into smaller, manageable sub-functions organized in a tree hierarchy.
Sub-functions are composed to achieve more complex objectives, allowing for better handling of intricate programming requirements.
FunCoder uses functional consensus to designate functions by identifying similarities in program behavior, which helps mitigate error propagation.
Benchmark results show FunCoder outperforms state-of-the-art methods by an average of 9.8% on HumanEval, MBPP, xCodeEval, and MATH datasets using GPT-3.5 and GPT-4.
The framework also enhances performance of smaller models, enabling StableCode-3b to surpass GPT-3.5 by 18.6% and achieve 97.7% of GPT-4's performance on HumanEval.

Tools you can use from the paper:

No implementation tools or repository links are provided.

Source: Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

ChatGPT's Self-Verification Capability in Code-Related Tasks: An Empirical Study

A comprehensive investigation evaluates ChatGPT's ability to self-verify its performance in code generation, completion, and repair tasks.

The study assesses ChatGPT's capability to generate correct code, complete code without vulnerabilities, and repair buggy code, followed by self-verification of these tasks.
Findings reveal that ChatGPT often incorrectly predicts its generated faulty code as correct, demonstrating self-contradictory hallucinations in its behavior.
The self-verification capability of ChatGPT can be improved by using guiding questions about assertions on incorrectly generated or repaired code and vulnerabilities in completed code.
ChatGPT-generated test reports can identify more vulnerabilities in completed code, but explanations for incorrectly generated code and failed repairs are mostly inaccurate.
The study provides implications for future research and development using ChatGPT in software development processes.

Tools you can use from the paper:

No implementation tools or repository links are provided.

Source: Fight Fire with Fire: How Much Can We Trust ChatGPT on Source Code-Related Tasks?

LLM-Based Software Vulnerability Detection: A Benchmarking Study

A comprehensive study evaluating the effectiveness of LLMs in detecting software vulnerabilities, comparing their performance to traditional static analysis tools.

The study proposes using LLMs to assist in finding vulnerabilities in source code, leveraging their ability to understand and generate code.
Multiple state-of-the-art LLMs were tested to identify the best prompting strategies for optimal performance in vulnerability detection.
LLMs outperformed traditional static analysis tools in terms of recall and F1 scores, identifying a greater number of issues.
The research provides an overview of the strengths and weaknesses of the LLM-based approach to vulnerability detection.
Findings aim to benefit software developers and security analysts in ensuring code is free of vulnerabilities.

Tools you can use from the paper:

No implementation tools or repository links are provided.

Source: Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study