3 min read

[AI Dev Tools] Chat with your dirs, Dependency Management, and Self varification

[AI Dev Tools] Chat with your dirs, Dependency Management, and Self varification
Source: https://arxiv.org/pdf/2405.12641v1

Dir-assistant: Chat with Directory Files using Local or API LLMs

Dir-assistant is a tool that enables chatting with files in your current directory using local or API-based LLMs, featuring CGRAG (Contextually Guided Retrieval-Augmented Generation) for improved accuracy.

Key Features:
  • Supports local LLMs via llama-cpp-python and API LLMs through LiteLLM, with platform support for various CPU and GPU architectures.
  • Implements file watching to automatically update the index when files change, eliminating the need for manual restarts.
  • Utilizes a RAG system with embedding models to identify relevant files for LLM processing.
  • Offers configuration options for both local and API LLMs, allowing customization of model parameters and API settings.
  • Provides file ignoring capabilities through command-line arguments and a global ignore list in the configuration file.
  • Example use: Navigate to a directory and run "dir-assistant" to start chatting with the files in that location.
  • Example use: Ignore specific files or directories by running "dir-assistant --ignore some-project-directory .git .gitignore".
Source: https://github.com/curvedinf/dir-assistant

DepsRAG: Managing Software Dependencies with LLMs

DepsRAG is a proof-of-concept approach that uses Retrieval Augmented Generation (RAG) to manage software dependencies across four popular ecosystems.

  • The system constructs a Knowledge Graph (KG) of direct and transitive dependencies for software packages.
  • It answers user questions about dependencies by generating queries to retrieve information from the KG and augmenting LLM inputs with this data.
  • Web search capability is included to address questions beyond the KG's scope.
  • DepsRAG aims to simplify the complex task of understanding dependencies and revealing hidden properties such as dependency chains and depth.
  • While offering tangible benefits, the approach also has limitations that are acknowledged by the developers.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: DepsRAG: Towards Managing Software Dependencies using Large Language Models

FunCoder: Recursive Function Decomposition for Complex Code Generation

FunCoder is a code generation framework that uses recursive function decomposition and consensus-based evaluation to improve performance on complex programming tasks.

  • The framework employs a divide-and-conquer strategy, breaking down complex requirements into smaller, manageable sub-functions organized in a tree hierarchy.
  • Sub-functions are composed to achieve more complex objectives, allowing for better handling of intricate programming requirements.
  • FunCoder uses functional consensus to designate functions by identifying similarities in program behavior, which helps mitigate error propagation.
  • Benchmark results show FunCoder outperforms state-of-the-art methods by an average of 9.8% on HumanEval, MBPP, xCodeEval, and MATH datasets using GPT-3.5 and GPT-4.
  • The framework also enhances performance of smaller models, enabling StableCode-3b to surpass GPT-3.5 by 18.6% and achieve 97.7% of GPT-4's performance on HumanEval.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

ChatGPT's Self-Verification Capability in Code-Related Tasks: An Empirical Study

A comprehensive investigation evaluates ChatGPT's ability to self-verify its performance in code generation, completion, and repair tasks.

  • The study assesses ChatGPT's capability to generate correct code, complete code without vulnerabilities, and repair buggy code, followed by self-verification of these tasks.
  • Findings reveal that ChatGPT often incorrectly predicts its generated faulty code as correct, demonstrating self-contradictory hallucinations in its behavior.
  • The self-verification capability of ChatGPT can be improved by using guiding questions about assertions on incorrectly generated or repaired code and vulnerabilities in completed code.
  • ChatGPT-generated test reports can identify more vulnerabilities in completed code, but explanations for incorrectly generated code and failed repairs are mostly inaccurate.
  • The study provides implications for future research and development using ChatGPT in software development processes.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Fight Fire with Fire: How Much Can We Trust ChatGPT on Source Code-Related Tasks?

LLM-Based Software Vulnerability Detection: A Benchmarking Study

A comprehensive study evaluating the effectiveness of LLMs in detecting software vulnerabilities, comparing their performance to traditional static analysis tools.

  • The study proposes using LLMs to assist in finding vulnerabilities in source code, leveraging their ability to understand and generate code.
  • Multiple state-of-the-art LLMs were tested to identify the best prompting strategies for optimal performance in vulnerability detection.
  • LLMs outperformed traditional static analysis tools in terms of recall and F1 scores, identifying a greater number of issues.
  • The research provides an overview of the strengths and weaknesses of the LLM-based approach to vulnerability detection.
  • Findings aim to benefit software developers and security analysts in ensuring code is free of vulnerabilities.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study