Coding News and Research

RSS

How AI Is (and Isn’t) Changing Slow Journalism

Researchers at Complutense University of Madrid explored AI's role in slow journalism, revealing limited adoption due to skepticism over creativity, ethics, and quality.

10 Dec 2024

TÜLU 3 Pushes the Boundaries of AI Post-Training Excellence

Researchers at Allen AI introduced TÜLU 3, an open-source framework for refining language models with advanced post-training techniques like RLVR, achieving superior performance over proprietary models in specific tasks and benchmarks. The release includes datasets, recipes, and evaluation tools to advance open AI research.

2 Dec 2024

Logic Training Transforms AI Into Smarter Problem-Solver

Researchers propose Additional Logic Training (ALT) to enhance reasoning in large language models using a robust, synthetic corpus, leading to significant performance boosts across logic, math, coding, and natural language tasks.

27 Nov 2024

AI Falters in Language Comprehension as Humans Maintain the Lead

Researchers tested seven advanced language models on a new comprehension benchmark and found they performed at chance accuracy, with inconsistent and non-human-like errors, while humans consistently outperformed them.

19 Nov 2024

Qwen2.5-Coder Redefines Coding AI With Scalable, High-Performance Models

Researchers unveiled Qwen2.5-Coder, a cutting-edge series of code-generation models outperforming larger competitors on key benchmarks, redefining coding intelligence. The series showcases exceptional scalability, long-context handling, and multilingual capabilities.

17 Nov 2024

Adaptive AI Agents Tackle Complex Tasks with Microsoft’s Magentic-One System

Microsoft's Magentic-One introduces a multi-agent AI system, coordinated by an Orchestrator, that autonomously handles complex tasks. Rigorous evaluation on diverse benchmarks showcases its adaptable and secure task-solving approach.

13 Nov 2024

Tencent’s Hunyuan-Large AI Model Sets New Benchmark with 389 Billion Parameters

Hunyuan-Large, Tencent’s largest open-source Transformer-based mixture of experts (MoE) model, pushes the boundaries of AI with 389 billion parameters and 52 billion activated experts, excelling in tasks like reasoning, coding, and long-context processing. It outperforms leading models like LLama3.1, demonstrating superior scalability and efficiency.

11 Nov 2024

Apple Researchers Challenge Large Language Models' Math Reasoning Capabilities with New Benchmark

Apple researchers introduced GSM-Symbolic, a new benchmark to reveal the weaknesses in large language models' mathematical reasoning, showing that they rely heavily on pattern-matching rather than genuine logic.

21 Oct 2024

OpenAI Advances AI Performance By Benchmarking Agents On Kaggle Competitions

OpenAI's MLE-bench evaluates AI agents on machine learning engineering tasks using Kaggle competitions, revealing promising performance in nearly 17% of trials. The benchmark is open-sourced to boost research on autonomous ML engineering.

15 Oct 2024

AI Transforms Game Development: DreamGarden Grows Playable Worlds from a Single Prompt

DreamGarden is a semi-autonomous AI assistant that helps game developers transform high-level prompts into actionable plans in Unreal Engine, enabling rapid prototyping with user feedback and intervention.

7 Oct 2024

Meta GenAI Boosts AI Learning with CGPO, Tackling Reward Hacking and Improving Multi-Task Performance

Researchers at Meta GenAI introduced CGPO, a new post-training method for reinforcement learning that outperforms existing techniques by addressing reward hacking and optimizing multi-task learning. CGPO showed superior performance across benchmarks in chat, coding, and STEM tasks.

7 Oct 2024

NVIDIA's NVLM 1.0 Revolutionizes AI with Breakthrough Multimodal Performance

NVIDIA introduces NVLM 1.0, a multimodal large language model that sets a new benchmark by excelling in both vision-language and text-only tasks, showcasing innovations in high-resolution image processing.

7 Oct 2024

Large Language Models in Astronomy Can Boost Research but Pose Ethical Risks

Researchers explored how large language models (LLMs) can assist astronomy research but warned of ethical challenges, including hallucinations and over-reliance on these tools. They emphasize the need for critical human oversight in LLM-driven workflows.

3 Oct 2024

ChatGPT Improves Software Security But Struggles With Complex Vulnerabilities

Researchers in Germany and Portugal examined the use of ChatGPT for secure software development, revealing both advantages and limitations in vulnerability detection and code fixing.

23 Sep 2024

AI Camera Traps With Continual Learning Boost Real-Time Wildlife Monitoring Accuracy

Researchers developed low-cost AI-enabled camera traps with on-site continual learning, significantly improving real-time wildlife monitoring accuracy in diverse environments.

11 Sep 2024

Reviewing Drone Imagery for Infrastructure

A systematic tertiary study analyzed 57 secondary studies from 2018 to 2023 on using drone imagery for infrastructure management. The research identified key application areas, assessed trends, and highlighted challenges, providing a valuable reference for researchers and practitioners in the field.

7 Aug 2024

Compressing CNNs Boosts Efficiency

Researchers developed a geometric method to compress convolutional neural networks, enhancing computational efficiency without sacrificing accuracy. By using the Separation Index, they significantly reduced model size, enabling efficient deployment on resource-constrained devices like wearables and IoT systems.

7 Aug 2024

CYBERSECEVAL 3 Security Benchmark Evaluates Risks in LLMs

CYBERSECEVAL 3 introduces new security benchmarks to evaluate large language models like Llama 3, focusing on offensive security capabilities and risks. These benchmarks help assess and mitigate threats, advancing AI-driven cybersecurity for developers, end-users, and third-party applications.

5 Aug 2024

Llama 3: Meta's New AI Model Rivals GPT-4

Meta's Llama 3, a 405B parameter transformer with a 128K token context window, matches GPT-4 in performance across various tasks. With integrated image, video, and speech capabilities, it emphasizes data quality and efficiency, though further development is needed for widespread release.

30 Jul 2024

GenSQL: Enhancing Probabilistic Database Queries

Researchers introduced GenSQL, a system for querying probabilistic generative models of database tables, combining SQL with specialized primitives to streamline Bayesian inference workflows. GenSQL outperformed competitors by up to 6.8 times on benchmarks, offering a robust and efficient solution for complex probabilistic queries.

16 Jul 2024