10 min read

[AI Dev Tools] AI-Powered Code Editor, Terminal Command Generator, Coding Plugin for Neovim ...

[AI Dev Tools] AI-Powered Code Editor, Terminal Command Generator, Coding Plugin for Neovim ...
Source: https://github.com/meltylabs/melty

Melty: AI-Powered Code Editor for Enhanced Productivity

Melty is an open-source AI code editor designed to collaborate with developers throughout their workflow, from terminal to GitHub, to produce production-ready code.

Key Features:
  • Integrates with the entire development workflow, including compiler, terminal, debugger, and tools like Linear and GitHub.
  • Performs large-scale refactoring across multiple files.
  • Creates web applications from scratch.
  • Assists in navigating large codebases efficiently.
  • Automatically writes commit messages based on code changes.
  • Adapts to and learns from the user's codebase over time.
  • Functions as a pair programmer, observing and assisting with every code change.
Source: https://github.com/meltylabs/melty

LLM-Term: AI-Powered Terminal Command Generator

LLM-Term is a Rust-based CLI tool that generates and executes terminal commands using OpenAI's language models or local Ollama models, streamlining command-line interactions.

Key Features:
  • Generates and executes terminal commands based on user prompts, supporting both PowerShell and Unix-like shells.
  • Offers configurable model selection and token limit, with options for OpenAI's GPT-4, GPT-4 Mini, or local Ollama models.
  • Provides a user-friendly interface with command confirmation before execution, enhancing safety and control.
  • Supports custom configuration through a JSON file, allowing users to set default models and token limits.
  • Includes flexible installation options, with pre-built binaries available for quick setup or the ability to build from source.
Source: https://github.com/dh1011/llm-term

nvim.ai: AI-Assisted Coding Plugin for Neovim

nvim.ai is a Neovim plugin that integrates AI-assisted coding and chat capabilities into the editor, offering context-aware assistance, inline code insertion, and support for multiple LLM providers.

Key Features:
  • Chat with buffers functionality allows interactive AI assistance with code and documents.
  • Inline assistant for code insertion and rewriting, triggered by user prompts.
  • Context-aware AI assistance leverages current work for relevant help.
  • Supports multiple LLM providers, including local options like Ollama and various cloud services.
  • Implements slash commands for buffer and diagnostic interactions.
  • Customizable configuration options for providers, models, and keymaps.
  • Integrates with nvim-cmp for command autocompletion.
Source: https://github.com/magicalne/nvim.ai

MarsCode Agent: LLM-Powered Automated Bug Fixing Framework

MarsCode Agent is a framework that leverages LLMs to automatically identify and repair bugs in software code, combining advanced code analysis techniques for accurate fault localization and patch generation.

  • The framework follows a systematic process of planning, bug reproduction, fault localization, candidate patch generation, and validation to ensure high-quality bug fixes.
  • MarsCode Agent addresses the challenges of applying LLMs to automated bug fixing in complex and diverse real-world software systems.
  • Evaluation on the SWE-bench, a comprehensive benchmark of real-world software projects, demonstrated a high success rate in bug fixing compared to most existing automated approaches.

Source: MarsCode Agent: AI-native Automated Bug Fixing

LLMs as Evaluators for Bug Report Summarization

A study investigating the effectiveness of LLMs in evaluating bug report summarization, comparing their performance to human evaluators.

  • The experiment involved three LLMs (GPT-4o, LLaMA-3, and Gemini) and human evaluators, tasked with selecting correct bug report titles and summaries from given options.
  • Results showed LLMs performed well in evaluating bug report summaries, with GPT-4o outperforming other models.
  • Both humans and LLMs demonstrated consistent decision-making, but human evaluators experienced fatigue over time, affecting their accuracy.
  • The study suggests LLMs have potential as automated evaluators for bug report summarization, potentially allowing for scaled-up evaluations while reducing human effort and fatigue.

Source: LLMs as Evaluators: A Novel Approach to Evaluate Bug Report Summarization

WebApp1K: Benchmarking LLMs on Web App Code Generation

A study evaluating 16 frontier LLMs on the WebApp1K benchmark, which assesses web application code generation capabilities.

  • All tested models demonstrated similar underlying knowledge, with performance differences primarily attributed to the frequency of mistakes.
  • Analysis of lines of code and failure distributions revealed that generating correct code is more complex than producing incorrect code.
  • Prompt engineering showed limited effectiveness in reducing errors, only proving useful in specific cases.
  • Findings suggest that future improvements in coding LLMs should focus on enhancing model reliability and minimizing mistakes.
Tools you can use from the paper:

Source: Insights from Benchmarking Frontier Language Models on Web App Code Generation

GitHub User Privacy Awareness: An Empirical Study

A study examining privacy setting usage and sensitive information disclosure on GitHub, analyzing data from 6,132 developers' pull request comments.

  • The research investigates how developers utilize GitHub's privacy settings and identifies types of sensitive information shared.
  • Findings reveal active engagement with available privacy settings, but also instances of private information disclosure in pull request comments.
  • Researchers explored sensitivity detection using an LLM and BERT, aiming to develop a personalized privacy assistant.
  • The study provides insights into the use and limitations of existing privacy protection tools on the platform.
  • Results offer motivation and methodology for creating improved, personalized privacy protection tools for GitHub users.
Tools you can use from the paper:

Source: Exploring User Privacy Awareness on GitHub: An Empirical Study

APITestGenie: Automated API Test Generation Using LLMs

APITestGenie is an approach and tool that leverages LLMs to generate executable API test scripts from business requirements and API specifications.

  • Designed to address the lack of studies exploring LLMs for testing Web APIs, which are fundamental to modern software systems and present significant test challenges.
  • In experiments with 10 real-world APIs, the tool generated valid test scripts 57% of the time, increasing to 80% with three generation attempts per task.
  • Human intervention is recommended to validate or refine generated scripts before integration into CI/CD pipelines, positioning APITestGenie as a productivity assistant rather than a replacement for testers.
  • Feedback from industry specialists indicated strong interest in adopting the tool to improve the API test process.
Tools you can use from the paper:

Source: APITestGenie: Automated API Test Generation through Generative AI

LLM Performance in Software Quality Assurance: A Comparative Study

A study evaluating the performance of various LLMs in software quality assurance tasks, specifically fault localization and vulnerability detection.

  • The research compared GPT-3.5, GPT-4, and four publicly available LLMs (LLaMA-3-70B, LLaMA-3-8B, Gemma-7B, and Mixtral-8x7B) across two SQA tasks.
  • Several LLMs outperformed GPT-3.5 in both tasks, with even lower-performing models providing unique correct predictions.
  • A voting mechanism combining different LLMs' results achieved more than 10% improvement over GPT-3.5 in both tasks.
  • A cross-validation approach, using one LLM to validate another's answer, led to performance improvements of 16% in fault localization and 12% in vulnerability detection compared to GPT-3.5.
  • The inclusion of explanations in LLMs' results affected the effectiveness of the cross-validation technique.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Beyond ChatGPT: Enhancing Software Quality Assurance Tasks with Diverse LLMs and Validation Techniques

AI's Impact on Software Engineering: Evolution, Not Obsolescence

A perspective on how artificial intelligence (AI) will affect software engineering (SE), arguing that AI will enhance rather than replace the discipline.

  • Despite dire warnings on social media, SE's rich and robust discipline is expected to adapt to AI innovations, not become obsolete.
  • SE encompasses the full scope of software design, development, deployment, and practical use, and has historically assimilated radical new offerings from AI.
  • Current AI innovations like machine learning, LLMs, and generative AI offer opportunities to extend SE models and methods.
  • AI may automate routine development processes and introduce new component types and architectures, potentially prompting a reevaluation of correctness and reliability concepts in SE.
  • The core principles and practices of SE are expected to remain relevant and evolve alongside AI advancements.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: tl;dr: Chill, y'all: AI Will Not Devour SE

App Store vs. LLM Approaches for Feature Elicitation in Software Development

A comparative study examining the differences between app store-inspired and LLM-based approaches for feature elicitation and refinement in software development.

  • Both app store and LLM approaches have proven beneficial for requirements elicitation, with developers often exploring competitors' apps and using LLMs for inspiration.
  • The study analyzed 1,200 sub-features recommended by both methods, identifying their benefits, challenges, and key differences.
  • Both approaches recommend highly relevant sub-features with clear descriptions, but LLMs appear more powerful for novel, unseen app scopes.
  • Some recommended features from both methods may be imaginary with unclear feasibility, highlighting the importance of human analysts in the elicitation process.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach

LLM Code Generation: Quality and Correctness Assessment

A study examining the correctness and quality of code generated by large language models (LLMs) like ChatGPT and Copilot for software development.

  • Controlled experiments were conducted using ChatGPT and Copilot to generate simple algorithms in Java and Python, along with corresponding unit tests.
  • The research assessed the correctness of the generated code and the quality (coverage) of the unit tests.
  • Significant differences were observed between the LLMs, programming languages, algorithm and test codes, and over time.
  • The paper presents the results and experimental methods, enabling future comparable assessments for various algorithms, languages, and LLMs.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Examination of Code generated by Large Language Models

LLMs in Industrial Test Maintenance: A Case Study at Ericsson AB

A study exploring the potential of LLMs to support and automate aspects of software test maintenance processes in industrial settings.

  • Test maintenance, involving the addition, removal, or modification of test cases, consumes significant time and resources in software testing.
  • The research investigated triggers indicating the need for test maintenance, potential LLM actions, and considerations for industrial LLM deployment.
  • Two multi-agent architectures were proposed and demonstrated, capable of predicting which test cases require maintenance after source code changes.
  • The study's findings contribute to both theoretical understanding and practical applications of LLMs in industrial test maintenance processes.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes

Generative AI in Requirements Engineering: A Systematic Literature Review

A systematic literature review analyzing the applications and challenges of generative AI (GenAI) in requirements engineering (RE).

  • The review examined 27 primary studies, focusing on GenAI applications across various RE phases, models and techniques used, and implementation challenges.
  • GenAI applications predominantly focus on early RE stages, particularly requirements elicitation and analysis, suggesting potential for expansion into later phases.
  • Large language models, especially the GPT series, dominate the field, indicating a need for more diverse AI approaches in RE.
  • Persistent challenges include domain-specific applications and interpretability of AI-generated outputs, highlighting areas requiring further research.
  • Future research priorities include extending GenAI applications across the entire RE lifecycle, enhancing domain-specific capabilities, and developing strategies for responsible AI integration in RE practices.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Generative AI for Requirements Engineering: A Systematic Literature Review

LLMs vs. Programming Platforms: Performance Assessment

A study evaluating the performance of LLMs on competitive programming platforms like LeetCode, Codeforces, and HackerRank, comparing their problem-solving abilities to human programmers.

  • The research tested 98 LeetCode problems, 126 Codeforces problems across 15 categories, nine online contests, and two HackerRank certification tests.
  • LLMs, particularly ChatGPT, showed strong performance on LeetCode (71.43% success rate) and HackerRank certifications, but struggled with virtual contests, especially on Codeforces.
  • In LeetCode archives, LLMs outperformed human users in time and memory efficiency, but underperformed in more challenging Codeforces contests.
  • While not an immediate threat, the study suggests LLMs' performance on these platforms is concerning and may require addressing as capabilities improve.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Are Large Language Models a Threat to Programming Platforms? An Exploratory Study

Cultural Values in LLM Adoption for Software Engineering

A study exploring the factors influencing the adoption of LLMs in software development, with a focus on the role of professionals' cultural values.

  • The research utilized the Unified Theory of Acceptance and Use of Technology (UTAUT2) framework and Hofstede's cultural dimensions to investigate LLM adoption factors.
  • Data from 188 software engineers was analyzed using Partial Least Squares-Structural Equation Modelling.
  • Habit and performance expectancy emerged as the primary drivers of LLM adoption in software development.
  • Cultural values did not significantly moderate the adoption process, suggesting that LLM adoption strategies can be universally applied across different cultural contexts.
  • Recommendations for organizations include offering training programs, creating a supportive environment for regular LLM use, and tracking performance improvements to encourage adoption.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Investigating the Role of Cultural Values in Adopting Large Language Models for Software Engineering

MACdroid: LLM-based GUI Test Migration via Abstraction and Concretization

MACdroid is an approach for migrating GUI test cases across different apps using a novel abstraction-concretization paradigm and LLMs.

  • The approach addresses limitations of traditional widget-mapping methods, which can produce incomplete or buggy test cases when apps implement functionalities differently.
  • MACdroid's abstraction technique extracts general test logic from source test cases targeting the same functionality across multiple apps.
  • The concretization technique uses the abstracted test logic to guide an LLM in generating specific GUI test cases, including events and assertions, for the target app.
  • Evaluation on two datasets (31 apps, 34 functionalities, 123 test cases) showed MACdroid successfully testing 64-75% of target functionalities, outperforming baselines by 42-191%.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: LLM-based Abstraction and Concretization for GUI Test Migration

LLM-Guided Unit Test Generation for Multiple Languages

A framework for automated unit test generation using large language models (LLMs) and static analysis, applicable to multiple programming languages including Java and Python.

  • The pipeline incorporates static analysis to guide LLMs in producing compilable, high-coverage test cases.
  • Empirical studies show the approach can match or exceed state-of-the-art techniques in test coverage while generating more natural, developer-friendly tests.
  • Evaluations were conducted on standard and enterprise Java applications, as well as a large Python benchmark.
  • The framework addresses complex software scenarios requiring environment mocking.
  • A user study with 161 professional developers confirmed the naturalness and readability of the generated tests.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Multi-language Unit Test Generation using LLMs

Open-Source LLM Debugging Evaluation

An evaluation of open-source large language models' (LLMs) capabilities in fixing buggy code, using the DebugBench benchmark.

  • The study addresses the need for local, open-source LLMs in companies with strict code sharing policies, while still leveraging AI for debugging support.
  • DebugBench, the benchmark used, includes over 4,000 buggy code instances in Python, Java, and C++.
  • Five open-source LLMs were evaluated, with scores ranging from 43.9% to 66.6%.
  • DeepSeek-Coder achieved the best performance across all three programming languages.
Tools you can use from the paper:
No implementation tools or repository links are provided.

Source: Debugging with Open-Source Large Language Models: An Evaluation