[AI Dev Tools] AI-Powered Code Editor, Terminal Command Generator, Coding Plugin for Neovim ...
![[AI Dev Tools] AI-Powered Code Editor, Terminal Command Generator, Coding Plugin for Neovim ...](/content/images/size/w960/2024/09/Screenshot_7.jpg)
Melty: AI-Powered Code Editor for Enhanced Productivity
Melty is an open-source AI code editor designed to collaborate with developers throughout their workflow, from terminal to GitHub, to produce production-ready code.
Key Features:- Integrates with the entire development workflow, including compiler, terminal, debugger, and tools like Linear and GitHub.
- Performs large-scale refactoring across multiple files.
- Creates web applications from scratch.
- Assists in navigating large codebases efficiently.
- Automatically writes commit messages based on code changes.
- Adapts to and learns from the user's codebase over time.
- Functions as a pair programmer, observing and assisting with every code change.
LLM-Term: AI-Powered Terminal Command Generator
LLM-Term is a Rust-based CLI tool that generates and executes terminal commands using OpenAI's language models or local Ollama models, streamlining command-line interactions.
Key Features:- Generates and executes terminal commands based on user prompts, supporting both PowerShell and Unix-like shells.
- Offers configurable model selection and token limit, with options for OpenAI's GPT-4, GPT-4 Mini, or local Ollama models.
- Provides a user-friendly interface with command confirmation before execution, enhancing safety and control.
- Supports custom configuration through a JSON file, allowing users to set default models and token limits.
- Includes flexible installation options, with pre-built binaries available for quick setup or the ability to build from source.
nvim.ai: AI-Assisted Coding Plugin for Neovim
nvim.ai is a Neovim plugin that integrates AI-assisted coding and chat capabilities into the editor, offering context-aware assistance, inline code insertion, and support for multiple LLM providers.
Key Features:- Chat with buffers functionality allows interactive AI assistance with code and documents.
- Inline assistant for code insertion and rewriting, triggered by user prompts.
- Context-aware AI assistance leverages current work for relevant help.
- Supports multiple LLM providers, including local options like Ollama and various cloud services.
- Implements slash commands for buffer and diagnostic interactions.
- Customizable configuration options for providers, models, and keymaps.
- Integrates with nvim-cmp for command autocompletion.
MarsCode Agent: LLM-Powered Automated Bug Fixing Framework
MarsCode Agent is a framework that leverages LLMs to automatically identify and repair bugs in software code, combining advanced code analysis techniques for accurate fault localization and patch generation.
- The framework follows a systematic process of planning, bug reproduction, fault localization, candidate patch generation, and validation to ensure high-quality bug fixes.
- MarsCode Agent addresses the challenges of applying LLMs to automated bug fixing in complex and diverse real-world software systems.
- Evaluation on the SWE-bench, a comprehensive benchmark of real-world software projects, demonstrated a high success rate in bug fixing compared to most existing automated approaches.
Source: MarsCode Agent: AI-native Automated Bug Fixing
LLMs as Evaluators for Bug Report Summarization
A study investigating the effectiveness of LLMs in evaluating bug report summarization, comparing their performance to human evaluators.
- The experiment involved three LLMs (GPT-4o, LLaMA-3, and Gemini) and human evaluators, tasked with selecting correct bug report titles and summaries from given options.
- Results showed LLMs performed well in evaluating bug report summaries, with GPT-4o outperforming other models.
- Both humans and LLMs demonstrated consistent decision-making, but human evaluators experienced fatigue over time, affecting their accuracy.
- The study suggests LLMs have potential as automated evaluators for bug report summarization, potentially allowing for scaled-up evaluations while reducing human effort and fatigue.
Source: LLMs as Evaluators: A Novel Approach to Evaluate Bug Report Summarization
WebApp1K: Benchmarking LLMs on Web App Code Generation
A study evaluating 16 frontier LLMs on the WebApp1K benchmark, which assesses web application code generation capabilities.
- All tested models demonstrated similar underlying knowledge, with performance differences primarily attributed to the frequency of mistakes.
- Analysis of lines of code and failure distributions revealed that generating correct code is more complex than producing incorrect code.
- Prompt engineering showed limited effectiveness in reducing errors, only proving useful in specific cases.
- Findings suggest that future improvements in coding LLMs should focus on enhancing model reliability and minimizing mistakes.
Source: Insights from Benchmarking Frontier Language Models on Web App Code Generation
GitHub User Privacy Awareness: An Empirical Study
A study examining privacy setting usage and sensitive information disclosure on GitHub, analyzing data from 6,132 developers' pull request comments.
- The research investigates how developers utilize GitHub's privacy settings and identifies types of sensitive information shared.
- Findings reveal active engagement with available privacy settings, but also instances of private information disclosure in pull request comments.
- Researchers explored sensitivity detection using an LLM and BERT, aiming to develop a personalized privacy assistant.
- The study provides insights into the use and limitations of existing privacy protection tools on the platform.
- Results offer motivation and methodology for creating improved, personalized privacy protection tools for GitHub users.
Source: Exploring User Privacy Awareness on GitHub: An Empirical Study
APITestGenie: Automated API Test Generation Using LLMs
APITestGenie is an approach and tool that leverages LLMs to generate executable API test scripts from business requirements and API specifications.
- Designed to address the lack of studies exploring LLMs for testing Web APIs, which are fundamental to modern software systems and present significant test challenges.
- In experiments with 10 real-world APIs, the tool generated valid test scripts 57% of the time, increasing to 80% with three generation attempts per task.
- Human intervention is recommended to validate or refine generated scripts before integration into CI/CD pipelines, positioning APITestGenie as a productivity assistant rather than a replacement for testers.
- Feedback from industry specialists indicated strong interest in adopting the tool to improve the API test process.
Source: APITestGenie: Automated API Test Generation through Generative AI
LLM Performance in Software Quality Assurance: A Comparative Study
A study evaluating the performance of various LLMs in software quality assurance tasks, specifically fault localization and vulnerability detection.
- The research compared GPT-3.5, GPT-4, and four publicly available LLMs (LLaMA-3-70B, LLaMA-3-8B, Gemma-7B, and Mixtral-8x7B) across two SQA tasks.
- Several LLMs outperformed GPT-3.5 in both tasks, with even lower-performing models providing unique correct predictions.
- A voting mechanism combining different LLMs' results achieved more than 10% improvement over GPT-3.5 in both tasks.
- A cross-validation approach, using one LLM to validate another's answer, led to performance improvements of 16% in fault localization and 12% in vulnerability detection compared to GPT-3.5.
- The inclusion of explanations in LLMs' results affected the effectiveness of the cross-validation technique.
AI's Impact on Software Engineering: Evolution, Not Obsolescence
A perspective on how artificial intelligence (AI) will affect software engineering (SE), arguing that AI will enhance rather than replace the discipline.
- Despite dire warnings on social media, SE's rich and robust discipline is expected to adapt to AI innovations, not become obsolete.
- SE encompasses the full scope of software design, development, deployment, and practical use, and has historically assimilated radical new offerings from AI.
- Current AI innovations like machine learning, LLMs, and generative AI offer opportunities to extend SE models and methods.
- AI may automate routine development processes and introduce new component types and architectures, potentially prompting a reevaluation of correctness and reliability concepts in SE.
- The core principles and practices of SE are expected to remain relevant and evolve alongside AI advancements.
Source: tl;dr: Chill, y'all: AI Will Not Devour SE
App Store vs. LLM Approaches for Feature Elicitation in Software Development
A comparative study examining the differences between app store-inspired and LLM-based approaches for feature elicitation and refinement in software development.
- Both app store and LLM approaches have proven beneficial for requirements elicitation, with developers often exploring competitors' apps and using LLMs for inspiration.
- The study analyzed 1,200 sub-features recommended by both methods, identifying their benefits, challenges, and key differences.
- Both approaches recommend highly relevant sub-features with clear descriptions, but LLMs appear more powerful for novel, unseen app scopes.
- Some recommended features from both methods may be imaginary with unclear feasibility, highlighting the importance of human analysts in the elicitation process.
Source: Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach
LLM Code Generation: Quality and Correctness Assessment
A study examining the correctness and quality of code generated by large language models (LLMs) like ChatGPT and Copilot for software development.
- Controlled experiments were conducted using ChatGPT and Copilot to generate simple algorithms in Java and Python, along with corresponding unit tests.
- The research assessed the correctness of the generated code and the quality (coverage) of the unit tests.
- Significant differences were observed between the LLMs, programming languages, algorithm and test codes, and over time.
- The paper presents the results and experimental methods, enabling future comparable assessments for various algorithms, languages, and LLMs.
Source: Examination of Code generated by Large Language Models
LLMs in Industrial Test Maintenance: A Case Study at Ericsson AB
A study exploring the potential of LLMs to support and automate aspects of software test maintenance processes in industrial settings.
- Test maintenance, involving the addition, removal, or modification of test cases, consumes significant time and resources in software testing.
- The research investigated triggers indicating the need for test maintenance, potential LLM actions, and considerations for industrial LLM deployment.
- Two multi-agent architectures were proposed and demonstrated, capable of predicting which test cases require maintenance after source code changes.
- The study's findings contribute to both theoretical understanding and practical applications of LLMs in industrial test maintenance processes.
Source: Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes
Generative AI in Requirements Engineering: A Systematic Literature Review
A systematic literature review analyzing the applications and challenges of generative AI (GenAI) in requirements engineering (RE).
- The review examined 27 primary studies, focusing on GenAI applications across various RE phases, models and techniques used, and implementation challenges.
- GenAI applications predominantly focus on early RE stages, particularly requirements elicitation and analysis, suggesting potential for expansion into later phases.
- Large language models, especially the GPT series, dominate the field, indicating a need for more diverse AI approaches in RE.
- Persistent challenges include domain-specific applications and interpretability of AI-generated outputs, highlighting areas requiring further research.
- Future research priorities include extending GenAI applications across the entire RE lifecycle, enhancing domain-specific capabilities, and developing strategies for responsible AI integration in RE practices.
Source: Generative AI for Requirements Engineering: A Systematic Literature Review
LLMs vs. Programming Platforms: Performance Assessment
A study evaluating the performance of LLMs on competitive programming platforms like LeetCode, Codeforces, and HackerRank, comparing their problem-solving abilities to human programmers.
- The research tested 98 LeetCode problems, 126 Codeforces problems across 15 categories, nine online contests, and two HackerRank certification tests.
- LLMs, particularly ChatGPT, showed strong performance on LeetCode (71.43% success rate) and HackerRank certifications, but struggled with virtual contests, especially on Codeforces.
- In LeetCode archives, LLMs outperformed human users in time and memory efficiency, but underperformed in more challenging Codeforces contests.
- While not an immediate threat, the study suggests LLMs' performance on these platforms is concerning and may require addressing as capabilities improve.
Source: Are Large Language Models a Threat to Programming Platforms? An Exploratory Study
Cultural Values in LLM Adoption for Software Engineering
A study exploring the factors influencing the adoption of LLMs in software development, with a focus on the role of professionals' cultural values.
- The research utilized the Unified Theory of Acceptance and Use of Technology (UTAUT2) framework and Hofstede's cultural dimensions to investigate LLM adoption factors.
- Data from 188 software engineers was analyzed using Partial Least Squares-Structural Equation Modelling.
- Habit and performance expectancy emerged as the primary drivers of LLM adoption in software development.
- Cultural values did not significantly moderate the adoption process, suggesting that LLM adoption strategies can be universally applied across different cultural contexts.
- Recommendations for organizations include offering training programs, creating a supportive environment for regular LLM use, and tracking performance improvements to encourage adoption.
Source: Investigating the Role of Cultural Values in Adopting Large Language Models for Software Engineering
MACdroid: LLM-based GUI Test Migration via Abstraction and Concretization
MACdroid is an approach for migrating GUI test cases across different apps using a novel abstraction-concretization paradigm and LLMs.
- The approach addresses limitations of traditional widget-mapping methods, which can produce incomplete or buggy test cases when apps implement functionalities differently.
- MACdroid's abstraction technique extracts general test logic from source test cases targeting the same functionality across multiple apps.
- The concretization technique uses the abstracted test logic to guide an LLM in generating specific GUI test cases, including events and assertions, for the target app.
- Evaluation on two datasets (31 apps, 34 functionalities, 123 test cases) showed MACdroid successfully testing 64-75% of target functionalities, outperforming baselines by 42-191%.
Source: LLM-based Abstraction and Concretization for GUI Test Migration
LLM-Guided Unit Test Generation for Multiple Languages
A framework for automated unit test generation using large language models (LLMs) and static analysis, applicable to multiple programming languages including Java and Python.
- The pipeline incorporates static analysis to guide LLMs in producing compilable, high-coverage test cases.
- Empirical studies show the approach can match or exceed state-of-the-art techniques in test coverage while generating more natural, developer-friendly tests.
- Evaluations were conducted on standard and enterprise Java applications, as well as a large Python benchmark.
- The framework addresses complex software scenarios requiring environment mocking.
- A user study with 161 professional developers confirmed the naturalness and readability of the generated tests.
Source: Multi-language Unit Test Generation using LLMs
Open-Source LLM Debugging Evaluation
An evaluation of open-source large language models' (LLMs) capabilities in fixing buggy code, using the DebugBench benchmark.
- The study addresses the need for local, open-source LLMs in companies with strict code sharing policies, while still leveraging AI for debugging support.
- DebugBench, the benchmark used, includes over 4,000 buggy code instances in Python, Java, and C++.
- Five open-source LLMs were evaluated, with scores ranging from 43.9% to 66.6%.
- DeepSeek-Coder achieved the best performance across all three programming languages.
Source: Debugging with Open-Source Large Language Models: An Evaluation