6 min read

[AI Dev Tools] Natural Language to Shell Commands, Code Management, Maturity Assessment ...

[AI Dev Tools] Natural Language to Shell Commands, Code Management, Maturity Assessment ...
Source: Empowering Agile-Based Generative Software Development through Human-AI Teamwork https://arxiv.org/pdf/2407.15568v1

llm2sh: Natural Language to Shell Command Translator

llm2sh is a command-line utility that translates plain-language requests into shell commands using LLMs, enabling users to interact with their system using natural language.

Key Features:
  • Translates natural language requests into executable shell commands, leveraging various LLMs for generation.
  • Offers a customizable configuration file and supports multiple LLM providers, including OpenAI, Claude, and Groq.
  • Includes a YOLO mode for running commands without confirmation and a verbose mode for debugging.
  • Provides options for dry-run, model selection, and custom sampling temperature.
  • Installed via pip: `pip install llm2sh`.
  • Example use: `llm2sh "list all files in the current directory"` generates and prompts to run the `ls -a` command.
  • Example use: `llm2sh "install docker in rootless mode"` generates a series of commands for Docker installation and setup.
Source: https://github.com/randombk/llm2sh

Mandark: Lightweight AI Assistant for Code Management

Mandark is a compact (~80kb) AI-powered tool that assists with various coding tasks, including self-improvement and codebase management.

Key Features:
  • Operates without installation using npx, supporting multiple LLM models for code editing and creation across multiple files.
  • Offers command-line diff verification and installs necessary packages automatically.
  • Provides token and cost estimation before execution, ensuring efficient resource management.
  • Compatible with any codebase, allowing users to specify folders or individual files for processing.
  • Includes options to print line-tagged compiled code and include import statements, offering flexibility in output and processing.
Source: https://github.com/hrishioa/mandark

CoDAT: Code Documentation and Analysis Tool for Maintaining Consistency

CoDAT is a tool designed to maintain consistency between different levels of code documentation and implementation, integrated into the IntelliJ IDEA IDE. The tool is not yet available.

  • The tool links and updates comments to remain consistent with code changes, flagging "out of date" comments to alert developers.
  • A large language model checks semantic consistency between code fragments and their corresponding comments, identifying both semantic inconsistencies and outdated documentation.
  • CoDAT supports a step-wise refinement approach, helping programmers correctly implement code sketches through one or more refinement iterations.
  • Implementation utilizes the Code Insight daemon package and a custom regular expression algorithm to mark tagged comments associated with changed code blocks.
  • The backend is structurally decentralized, allowing for a distributed ledger framework for code consistency and architectural compilation tracking.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Code Documentation and Analysis to Secure Software Development

Code LLM Maturity Assessment: Postcondition Generation Benchmark

A new benchmark for assessing code LLM capabilities beyond code generation, focusing on postcondition generation to evaluate a more comprehensive set of skills.

  • The benchmark addresses limitations of existing code LLM evaluations, which primarily focus on code generation from natural language descriptions.
  • Postcondition generation requires LLMs to understand code semantics, natural language, and generate unambiguous conditions in programming languages.
  • Various types of postconditions demand different levels of capabilities, making this task suitable for evaluating LLM maturity.
  • The researchers augmented the EvalPlus dataset to create a postcondition testing benchmark and evaluated several open-source models.
  • Results highlight areas for improvement in code LLMs, providing insights for future development.
Tools you can use from the paper:

Source: Beyond Code Generation: Assessing Code LLM Maturity with Postconditions

AgileGen: Agile-Based Generative Software Development with Human-AI Collaboration

AgileGen is a system that enhances generative software development through human-AI teamwork, using Agile methodologies and Gherkin for testable requirements.

  • The system addresses the challenge of incomplete user requirements in software development, which often hinders full application functionality implementation.
  • AgileGen utilizes Gherkin for testable requirements, ensuring semantic consistency between user needs and generated code.
  • Human-AI collaboration is a key feature, allowing users to participate in decision-making processes where they excel, improving application functionality completeness.
  • A memory pool mechanism collects and recommends user decision-making scenarios, enhancing the reliability of future user interactions.
  • The approach outperformed existing methods by 16.4% and achieved higher user satisfaction, offering a user-friendly interactive system for software development.
Tools you can use from the paper:

Source: Empowering Agile-Based Generative Software Development through Human-AI Teamwork

LLMs for Autonomic Computing in Microservice Management

A study exploring the use of LLMs to realize the Vision of Autonomic Computing (ACV) through a multi-agent framework for microservice management.

  • The research introduces a five-level taxonomy for autonomous service maintenance, addressing the challenges of achieving self-managing computing systems.
  • An online evaluation benchmark based on the Sock Shop microservice demo project assesses the framework's performance.
  • Findings show significant progress towards Level 3 autonomy, demonstrating LLMs' effectiveness in detecting and resolving issues within microservice architectures.
  • The study contributes to advancing autonomic computing by integrating LLMs into microservice management frameworks.
  • Code for the project will be available at https://aka.ms/ACV-LLM.
Tools you can use from the paper:

Source: The Vision of Autonomic Computing: Can LLMs Make It a Reality?

Automated User Feedback Processing for Software Development

A summary of techniques for processing user feedback to improve software development, addressing challenges of quantity and quality in feedback data.

  • User feedback from social media, product forums, and app stores provides valuable insights for software teams, aiding in understanding feature usage, identifying defects, and inspiring improvements.
  • Two main challenges: managing large quantities of feedback data and dealing with varying quality of feedback items, which may be uninformative, repetitive, or incorrect.
  • The chapter outlines various data mining, machine learning, and natural language processing techniques, including LLMs, to address these challenges.
  • Guidance is provided for researchers and practitioners on implementing effective, actionable analysis of user feedback for software and requirements engineering.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: On the Automated Processing of User Feedback

CoDefeater: Automated Defeater Generation for Assurance Cases Using LLMs

CoDefeater is an automated process that uses LLMs to find defeaters in assurance cases for safety-critical systems.

  • Assurance cases are crucial for demonstrating the safety of critical systems in their intended environments.
  • Defeaters, which challenge claims in assurance cases, help identify weaknesses and prompt further investigation.
  • Traditionally, capturing defeaters relies on expert judgment and experience, requiring iterative refinement.
  • CoDefeater leverages LLMs to efficiently generate both known and unforeseen feasible defeaters.
  • Initial results on two systems show promise in enhancing the completeness and confidence of assurance cases.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: CoDefeater: Using LLMs To Find Defeaters in Assurance Cases

COMCAT: Improved Code Comprehension Through AI-Generated Comments

COMCAT is an approach that uses LLMs to generate informative code comments, significantly improving developer comprehension in software engineering tasks.

  • The system automates comment generation for C/C++ files by identifying suitable locations, predicting helpful comment types, and generating comments based on these factors.
  • Human subject evaluation showed COMCAT-generated comments improved code comprehension by up to 12% for 87% of participants across three software engineering tasks.
  • COMCAT comments were found to be as accurate and readable as human-generated ones, and preferred over standard ChatGPT-generated comments for up to 92% of code snippets.
  • A dataset containing source code snippets, human-written comments, and human-annotated comment categories was developed and released alongside the project.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization

Evidence-Based Practices in LLM Programming Assistants: An Evaluation

A study evaluating the alignment of LLM-based programming assistants with evidence-based software engineering practices.

  • The research investigated 17 evidence-based claims from empirical software engineering across five LLM-based programming assistants.
  • Findings reveal that these AI assistants have ambiguous beliefs regarding research claims and lack credible evidence to support their responses.
  • LLM-based programming assistants were found to be incapable of adopting practices demonstrated by empirical software engineering research to support development tasks.
  • The study provides implications for practitioners using LLM-based programming assistants in development contexts.
  • Researchers suggest future directions to enhance the reliability and trustworthiness of LLMs, aiming to increase awareness and adoption of evidence-based software engineering research findings in practice.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Exploring the Evidence-Based Beliefs and Behaviors of LLM-Based Programming Assistants

SqlCompose: AI-Assisted SQL Authoring at Meta

SqlCompose is a set of AI models developed by Meta for assisting with SQL authoring, addressing challenges specific to SQL's declarative nature and non-linear writing process.

  • An internal SQL benchmark was created at Meta to perform offline tests. The Public Llama model achieved BLEU scores of 53% and 24% for single- and multi-line predictions, respectively.
  • SqlComposeSA, a fine-tuned version of Llama on Meta's internal data and schemas, outperformed the base Llama model by 16 percentage points on BLEU score.
  • SqlComposeFIM, a fill-in-the-middle model aware of context before and after the lines to be completed, surpassed SqlComposeSA by 35 percentage points and correctly identified table names 75% of the time.
  • Deployed at Meta, SqlCompose is used weekly by over 10,000 users, with less than 1% opting to disable it. User feedback highlighted its usefulness in completing repetitive SQL clauses and suggesting boilerplate code.
  • Despite being smaller (7bn and 13bn parameters), SqlCompose models consistently outperformed larger general-purpose LLMs, indicating the potential of specialized models in specific domains.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: AI-Assisted SQL Authoring at Industry Scale