13 Jan 2025 5 min read

[AI Dev Tools] LLM-Powered Code Assistants, Automated Testing, Prompt Optimization ...

Source: https://github.com/danilofalcao/jarvis

Cool Cline: Agentic Coding Assistant for CLI and Editor

Cool Cline is an advanced coding assistant that integrates with your command line interface and editor, leveraging LLM capabilities to handle complex software development tasks.

Key Features:

Intelligent task processing, including analyzing file structures, source code ASTs, and running regex searches to understand existing projects.
Creates and edits files, executes terminal commands, and uses a headless browser for web development tasks, allowing it to fix runtime errors and visual bugs.
Supports various API providers and local models, with token usage and cost tracking.
Executes commands directly in the terminal, adapting to your development environment and toolchain.
Presents changes through a diff view, allowing for easy review and modification of its suggestions.
Utilizes browser capabilities for interactive debugging and end-to-end testing of web applications.
Extends its capabilities through custom tools using the Model Context Protocol, tailoring its functionality to specific workflows.

Source: https://github.com/coolcline/CoolCline

J.A.R.V.I.S.: AI-Powered Coding Assistant

J.A.R.V.I.S. is an intelligent coding assistant that integrates multiple LLMs to assist with code generation, modifications, and technical discussions within a comprehensive development environment.

Key Features:

Integrated cross-platform terminal with real-time output streaming, command history, and native shell integration.
Support for multiple LLMs, including DeepSeek V3, Gemini 2.0, Grok 2, Claude 3.5, and various GPT models.
File attachment support for various formats, including PDFs, Word documents, Excel spreadsheets, and images with OCR capabilities.
Real-time updates and notifications for code changes and workspace modifications.
Workspace management for creating, renaming, and browsing multiple projects.
AI-assisted code generation and modification with preview and diff functionality.
Interactive chat for discussing code and technical concepts with context-aware responses.

Source: https://github.com/danilofalcao/jarvis

LLM-Generated Programming Error Explanations: Effectiveness Without Original Error Messages

A study exploring the effectiveness of LLM-generated programming error explanations when provided with only the erroneous source code, without the original compiler/interpreter error messages.

Traditional programming error messages can be unhelpful for novices, often containing confusing jargon or misleading information.
The research focuses on GPT-3.5's ability to generate error explanations based solely on the problematic source code, without access to the original error messages.
Various strategies, including one-shot prompting and fine-tuning, were employed to enhance the effectiveness of the LLM-generated error explanations.
Results provide insights into the baseline effectiveness of these explanations and the impact of different prompting strategies on their quality.
Findings aim to help educators better understand LLM responses to novice-like prompts and potentially improve the use of Generative AI in programming education.

Source: Debugging Without Error Messages: How LLM Prompting Strategy Affects Programming Error Explanation Effectiveness

Impact of Requirements Smells on LLM Performance in Traceability Tasks

A study investigating how requirements smells affect large language models' (LLMs) performance in automated traceability tasks.

Requirements smells are indicators of potential issues like ambiguity and inconsistency in software requirements.
Experiments were conducted using two LLMs for automated trace link generation between requirements and code.
Results showed a small but significant effect of requirements smells when predicting the existence of a trace link between a requirement and code.
No significant effect was observed when tracing requirements to specific lines of code.
The study highlights the need for further research to understand the nuanced impact of requirements smells on LLM performance in different software engineering tasks.

Source: On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability

Exploring Societal Stereotypes in LLMs for Software Engineering Profiles

A study investigating how GPT-4 and Microsoft Copilot reinforce gender and racial stereotypes in software engineering through text and image outputs.

The research generated 300 profiles for software engineering roles using each LLM, including gender-based and gender-neutral profiles.
LLMs were tasked with recommending candidates for four distinct software engineering positions, selecting top 5 and the best candidate for each role.
Analysis revealed a preference for male and Caucasian profiles, especially for senior positions.
Generated images favored traits such as lighter skin tones, slimmer body types, and younger appearances.
Findings highlight the influence of societal biases on LLM outputs, potentially limiting diversity and perpetuating inequities in the software engineering field.

Source: What Does a Software Engineer Look Like? Exploring Societal Stereotypes in LLMs

Pythoness: LLM-Driven Code Generation with Behavioral Specifications

Pythoness is an embedded domain-specific language (DSL) that enables developers to program with LLMs at a higher level of abstraction, using behavioral specifications instead of directly interacting with generated code.

The DSL addresses challenges in optimizing, integrating, and maintaining AI-generated code, which often lacks guarantees of correctness and reliability.
Developers use Pythoness to write functions, classes, or entire programs through behavioral specifications, including unit tests and property-based tests in formal or natural language.
Guided by these specifications, Pythoness generates code that passes the tests and can be continuously checked during execution.
The approach aims to harness the full potential of LLMs for code generation while mitigating inherent risks.
A prototype implementation demonstrates that Pythoness can leverage a combination of tests and code generation to produce higher quality code than specifications alone.

Source: Effective LLM-Driven Code Generation with Pythoness

Software Engineers' Perceptions of AI-Assisted vs. Peer-Led Code Reviews

A study exploring how software engineers perceive and engage with LLM-assisted code reviews compared to human peer reviews, revealing multidimensional impacts on cognitive, emotional, and behavioral aspects.

Engagement in code review spans cognitive, emotional, and behavioral dimensions, with LLM-assisted reviews impacting these attributes differently than peer reviews.
LLM-assisted reviews require less emotional regulation and coping mechanisms compared to peer reviews, but may increase cognitive load due to excessive detail in feedback.
Software engineers use similar sense-making processes to evaluate and adopt feedback from both peers and LLMs, though LLM feedback adoption is constrained by trust issues and lack of context.
The study contributes to understanding AI's impact on software engineering socio-technical processes and provides insights into future AI-human collaboration in the field.

Source: Human and Machine: How Software Engineers Perceive and Engage with AI-Assisted Code Reviews Compared to Their Peers

The Prompt Alchemist: Automated Prompt Optimization for LLM Test Case Generation

A framework for optimizing prompts used by Large Language Models (LLMs) to generate software test cases, addressing the limitations of human-written prompts and one-size-fits-all approaches.

LLMs have shown potential in generating useful test cases for source code, but their performance is heavily dependent on the quality of prompts used.
Current methods often use the same prompt for all LLMs, ignoring the fact that different models may respond better to different prompts.
Existing automated prompt optimization techniques in natural language processing struggle to produce effective prompts for test case generation. They lack diversity and domain-specific knowledge.
The Prompt Alchemist aims to overcome these challenges by automatically discovering optimal prompts tailored to each LLM for improved test case generation.

Source: The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation

MCP-Solver: Integrating LLMs with Constraint Programming Systems

A prototype implementation combining the natural language capabilities of LLMs with the formal reasoning of constraint programming systems.

The Model Context Protocol enables systematic integration between LLMs and constraint programming systems.
Interfaces for creating, editing, and validating constraint models ensure consistency at each modification step.
Item-based editing approach with integrated validation allows for structured iterative refinement.
The system handles concurrent solving sessions and maintains a persistent knowledge base of modeling insights.
Open-source implementation serves as a proof of concept for integrating formal reasoning systems with LLMs through standardized protocols.

Source: MCP-Solver: Integrating Language Models with Constraint Programming Systems

LLM-Powered Framework for Automated Software Testing

A framework that leverages Large Language Models (LLMs) to automate various aspects of software testing, from test generation to reporting.

The system uses an agent-oriented approach to reduce human intervention and enhance testing efficiency.
LLMs are integrated to generate unit tests, visualize call graphs, and automate test execution and reporting.
Evaluations across multiple Python and Java applications demonstrate high test coverage and efficient operation.
The framework addresses the need for robust validation and verification processes while potentially reducing the time and cost associated with manual testing.

Source: The Potential of LLMs in Automating Software Testing: From Generation to Reporting