AI for software development

Sign in Subscribe

23 Jul 2024 6 min read

[AI Dev Tools] Natural Language to Shell Commands, Code Management, Maturity Assessment ...

Source: Empowering Agile-Based Generative Software Development through Human-AI Teamwork https://arxiv.org/pdf/2407.15568v1

llm2sh: Natural Language to Shell Command Translator

llm2sh is a command-line utility that translates plain-language requests into shell commands using LLMs, enabling users to interact with their system using natural language.

Key Features:

Translates natural language requests into executable shell commands, leveraging various LLMs for generation.
Offers a customizable configuration file and supports multiple LLM providers, including OpenAI, Claude, and Groq.
Includes a YOLO mode for running commands without confirmation and a verbose mode for debugging.
Provides options for dry-run, model selection, and custom sampling temperature.
Installed via pip: `pip install llm2sh`.
Example use: `llm2sh "list all files in the current directory"` generates and prompts to run the `ls -a` command.
Example use: `llm2sh "install docker in rootless mode"` generates a series of commands for Docker installation and setup.

Source: https://github.com/randombk/llm2sh

Mandark: Lightweight AI Assistant for Code Management

Mandark is a compact (~80kb) AI-powered tool that assists with various coding tasks, including self-improvement and codebase management.

Key Features:

Operates without installation using npx, supporting multiple LLM models for code editing and creation across multiple files.
Offers command-line diff verification and installs necessary packages automatically.
Provides token and cost estimation before execution, ensuring efficient resource management.
Compatible with any codebase, allowing users to specify folders or individual files for processing.
Includes options to print line-tagged compiled code and include import statements, offering flexibility in output and processing.

Source: https://github.com/hrishioa/mandark

CoDAT: Code Documentation and Analysis Tool for Maintaining Consistency

CoDAT is a tool designed to maintain consistency between different levels of code documentation and implementation, integrated into the IntelliJ IDEA IDE. The tool is not yet available.

The tool links and updates comments to remain consistent with code changes, flagging "out of date" comments to alert developers.
A large language model checks semantic consistency between code fragments and their corresponding comments, identifying both semantic inconsistencies and outdated documentation.
CoDAT supports a step-wise refinement approach, helping programmers correctly implement code sketches through one or more refinement iterations.
Implementation utilizes the Code Insight daemon package and a custom regular expression algorithm to mark tagged comments associated with changed code blocks.
The backend is structurally decentralized, allowing for a distributed ledger framework for code consistency and architectural compilation tracking.

Tools you can use from the paper:

No implementation tools or repository links are provided.

Source: Code Documentation and Analysis to Secure Software Development

Code LLM Maturity Assessment: Postcondition Generation Benchmark

A new benchmark for assessing code LLM capabilities beyond code generation, focusing on postcondition generation to evaluate a more comprehensive set of skills.

The benchmark addresses limitations of existing code LLM evaluations, which primarily focus on code generation from natural language descriptions.
Postcondition generation requires LLMs to understand code semantics, natural language, and generate unambiguous conditions in programming languages.
Various types of postconditions demand different levels of capabilities, making this task suitable for evaluating LLM maturity.
The researchers augmented the EvalPlus dataset to create a postcondition testing benchmark and evaluated several open-source models.
Results highlight areas for improvement in code LLMs, providing insights for future development.

Tools you can use from the paper:

https://github.com/MatureModel/PostcondGen

Source: Beyond Code Generation: Assessing Code LLM Maturity with Postconditions

AgileGen: Agile-Based Generative Software Development with Human-AI Collaboration

AgileGen is a system that enhances generative software development through human-AI teamwork, using Agile methodologies and Gherkin for testable requirements.

The system addresses the challenge of incomplete user requirements in software development, which often hinders full application functionality implementation.
AgileGen utilizes Gherkin for testable requirements, ensuring semantic consistency between user needs and generated code.
Human-AI collaboration is a key feature, allowing users to participate in decision-making processes where they excel, improving application functionality completeness.
A memory pool mechanism collects and recommends user decision-making scenarios, enhancing the reliability of future user interactions.
The approach outperformed existing methods by 16.4% and achieved higher user satisfaction, offering a user-friendly interactive system for software development.

Tools you can use from the paper:

https://github.com/HarrisClover/AgileGen

Source: Empowering Agile-Based Generative Software Development through Human-AI Teamwork

LLMs for Autonomic Computing in Microservice Management

A study exploring the use of LLMs to realize the Vision of Autonomic Computing (ACV) through a multi-agent framework for microservice management.

The research introduces a five-level taxonomy for autonomous service maintenance, addressing the challenges of achieving self-managing computing systems.
An online evaluation benchmark based on the Sock Shop microservice demo project assesses the framework's performance.
Findings show significant progress towards Level 3 autonomy, demonstrating LLMs' effectiveness in detecting and resolving issues within microservice architectures.
The study contributes to advancing autonomic computing by integrating LLMs into microservice management frameworks.
Code for the project will be available at https://aka.ms/ACV-LLM.

Tools you can use from the paper:

https://aka.ms/ACV-LLM

Source: The Vision of Autonomic Computing: Can LLMs Make It a Reality?

Automated User Feedback Processing for Software Development

A summary of techniques for processing user feedback to improve software development, addressing challenges of quantity and quality in feedback data.

User feedback from social media, product forums, and app stores provides valuable insights for software teams, aiding in understanding feature usage, identifying defects, and inspiring improvements.
Two main challenges: managing large quantities of feedback data and dealing with varying quality of feedback items, which may be uninformative, repetitive, or incorrect.
The chapter outlines various data mining, machine learning, and natural language processing techniques, including LLMs, to address these challenges.
Guidance is provided for researchers and practitioners on implementing effective, actionable analysis of user feedback for software and requirements engineering.

Tools you can use from the paper:

No implementation tools or repository links are provided.

Source: On the Automated Processing of User Feedback

CoDefeater: Automated Defeater Generation for Assurance Cases Using LLMs

CoDefeater is an automated process that uses LLMs to find defeaters in assurance cases for safety-critical systems.

Assurance cases are crucial for demonstrating the safety of critical systems in their intended environments.
Defeaters, which challenge claims in assurance cases, help identify weaknesses and prompt further investigation.
Traditionally, capturing defeaters relies on expert judgment and experience, requiring iterative refinement.
CoDefeater leverages LLMs to efficiently generate both known and unforeseen feasible defeaters.
Initial results on two systems show promise in enhancing the completeness and confidence of assurance cases.

Tools you can use from the paper:

No implementation tools or repository links are provided.

Source: CoDefeater: Using LLMs To Find Defeaters in Assurance Cases

COMCAT: Improved Code Comprehension Through AI-Generated Comments

COMCAT is an approach that uses LLMs to generate informative code comments, significantly improving developer comprehension in software engineering tasks.

The system automates comment generation for C/C++ files by identifying suitable locations, predicting helpful comment types, and generating comments based on these factors.
Human subject evaluation showed COMCAT-generated comments improved code comprehension by up to 12% for 87% of participants across three software engineering tasks.
COMCAT comments were found to be as accurate and readable as human-generated ones, and preferred over standard ChatGPT-generated comments for up to 92% of code snippets.
A dataset containing source code snippets, human-written comments, and human-annotated comment categories was developed and released alongside the project.

Tools you can use from the paper:

No implementation tools or repository links are provided.

Source: COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization

Evidence-Based Practices in LLM Programming Assistants: An Evaluation

A study evaluating the alignment of LLM-based programming assistants with evidence-based software engineering practices.

The research investigated 17 evidence-based claims from empirical software engineering across five LLM-based programming assistants.
Findings reveal that these AI assistants have ambiguous beliefs regarding research claims and lack credible evidence to support their responses.
LLM-based programming assistants were found to be incapable of adopting practices demonstrated by empirical software engineering research to support development tasks.
The study provides implications for practitioners using LLM-based programming assistants in development contexts.
Researchers suggest future directions to enhance the reliability and trustworthiness of LLMs, aiming to increase awareness and adoption of evidence-based software engineering research findings in practice.

Tools you can use from the paper:

No implementation tools or repository links are provided.

Source: Exploring the Evidence-Based Beliefs and Behaviors of LLM-Based Programming Assistants

SqlCompose: AI-Assisted SQL Authoring at Meta

SqlCompose is a set of AI models developed by Meta for assisting with SQL authoring, addressing challenges specific to SQL's declarative nature and non-linear writing process.

An internal SQL benchmark was created at Meta to perform offline tests. The Public Llama model achieved BLEU scores of 53% and 24% for single- and multi-line predictions, respectively.
SqlComposeSA, a fine-tuned version of Llama on Meta's internal data and schemas, outperformed the base Llama model by 16 percentage points on BLEU score.
SqlComposeFIM, a fill-in-the-middle model aware of context before and after the lines to be completed, surpassed SqlComposeSA by 35 percentage points and correctly identified table names 75% of the time.
Deployed at Meta, SqlCompose is used weekly by over 10,000 users, with less than 1% opting to disable it. User feedback highlighted its usefulness in completing repetitive SQL clauses and suggesting boilerplate code.
Despite being smaller (7bn and 13bn parameters), SqlCompose models consistently outperformed larger general-purpose LLMs, indicating the potential of specialized models in specific domains.

Tools you can use from the paper:

No implementation tools or repository links are provided.

Source: AI-Assisted SQL Authoring at Industry Scale