Go Touch Some Grass
Sensei sent FikAi out to touch grass — and to bring back the harvest. (Would've said the hunt, but it's grass.) Here's what our guy found for the Dojo.
From the scrolls
WARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Data
This paper introduces WARDEN, an early language model system capable of transcribing and translating Wardaman, an endangered Australian indigenous language into English. The significant challenge we face is the lack of large-scale training data: in fact, we only have 6 hours of annotated audio. Therefore, while it is common practice to train a single model for transcription and translation using large datasets (like English to French), this practice is no longer viable in the Wardaman to English
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents
Voice agents, artificial intelligence systems that conduct spoken conversations to complete tasks, are increasingly deployed across enterprise applications. However, no existing benchmark jointly addresses two core evaluation challenges: generating realistic simulated conversations, and measuring quality across the full scope of voice-specific failure modes. We present EVA-Bench, an end-to-end evaluation framework that addresses both. On the simulation side, EVA-Bench orchestrates bot-to-bot aud
Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights
Multi-agent LLM systems usually collaborate by exchanging natural-language messages. This interface is simple and interpretable, but it forces each sender's intermediate computation to be serialized into tokens and then reprocessed by the receiver, thereby increasing the generated-token cost, prefill overhead, and KV-cache memory. We study an alternative communication interface: instead of appending a sender's message to the receiver's context, compile the sender's hidden states into a transient
QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling
Modeling long-range dependencies in sequential data remains a central challenge in machine learning. Transformers address this challenge through attention mechanisms, but their quadratic complexity with respect to sequence length limits scalability to long contexts. State-space models (SSMs) provide an efficient alternative with linear-time computation by evolving a latent state through recurrent updates, but their memory is typically formed via additive or linear transitions, which can limit th
Negation Neglect: When models fail to learn negations in training
We introduce Negation Neglect, where finetuning LLMs on documents that flag a claim as false makes them believe the claim is true. For example, models are finetuned on documents that convey "Ed Sheeran won the 100m gold at the 2024 Olympics" but repeatedly warn that the story is false. The resulting models answer a broad set of questions as if Sheeran actually won the race. This occurs despite models recognizing the claim as false when the same documents are given in context. In experiments with
History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions
Frontier LLMs are increasingly deployed as agents that pick the next action after a long log of prior tool calls produced by the same or a different model. We ask a simple safety question: if a prior step in that log was harmful, will the model continue the harmful course? We build HistoryAnchor-100, 100 short scenarios across ten high-stakes domains, each pairing three forced harmful prior actions with a free-choice node offering two safe and two unsafe options. Across 17 frontier models from s
Harnessing Agentic Evolution
Agentic evolution has emerged as a powerful paradigm for improving programs, workflows, and scientific solutions by iteratively generating candidates, evaluating them, and using feedback to guide future search. However, existing methods are typically instantiated either as fixed hand-designed procedures that are modular but rigid, or as general-purpose agents that flexibly integrate feedback but can drift in long-horizon evolution. Both forms accumulate rich evidence over time, including candida
Neurosymbolic Auditing of Natural-Language Software Requirements
Natural-language software requirements are often ambiguous, inconsistent, and underspecified; in safety-critical domains, these defects propagate into formal models that verify the wrong specification and into implementations that ship unsafe behavior. We show that large language models, equipped with an SMT solver, can audit such requirements: translating them into formal logic, detecting ambiguity through stochastic variation in the generated formalization, and exposing inconsistency, vacuousn
Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo
Neural-network quantum states have emerged as a powerful variational framework for quantum many-body systems, with recent progress often driven by massively parallel architectures such as transformers. Recurrent neural network quantum states, however, are frequently regarded as intrinsically sequential and therefore less scalable. Here we revisit this view by showing that modern recurrent architectures can support fast, accurate, and computationally accessible neural quantum state simulations. U
Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling
As generative AI models such as large language models (LLMs) become more pervasive, ensuring the safety, robustness, and overall trustworthiness of these systems is paramount. However, AI is currently facing a reproducibility crisis driven by unreliable evaluations and unrepeatable experimental results. While human raters are often used to assess models for utility and safety, they introduce divergent biases and subjective opinions into their annotations. Overcoming this variance is exceptionall
What the masters say
The JAX package is now around the same level, 20M monthly downloads. Which is incredibly fast growth, because 5 years ago I recall it being below 2M or so. It went from niche to mainstream in the past couple of years. Well deserved success.
@fchollet
The Keras package recently crossed 21M monthly downloads on PyPI, an all-time high (the daily ATH is around 900k). I still remember when it first crossed 10M monthly downloads about 5 years ago and I thought it couldn't possibly go any higher...
@fchollet
also all this: https://t.co/UvO0GnmPzX
@sama
Codex in the ChatGPT mobile app!
@sama
New course: Transformers in Practice. You'll get a practical view of how transformer-based LLMs work, so you can reason about their behavior, diagnose problems like slow inference, and make smarter decisions about deployment. This course is built in partnership with @AMD and htt
@AndrewYNg
This reminds me of computerization. The amount of "work" people could execute on computers increased by a huge factor, but their productivity did not. The amount of work "needed" to arrive at the same high-level outputs exploded.
@fchollet
The quantity of code that devs ship has roughly 10xed. But net developer productivity (value created by unit of time) is only up by a bit, if at all. Part of it is that the additional code is solving more incremental problems. A bigger part is that the new code is creating
@fchollet
being a dad is the thing that has most exceeded already-high-expectations in my whole life
@sama
The JAX package is now around the same level, 20M monthly downloads. Which is incredibly fast growth, because 5 years ago I recall it being below 2M or so. It went from niche to mainstream in the past couple of years. Well deserved success.
@fchollet
The Keras package recently crossed 21M monthly downloads on PyPI, an all-time high (the daily ATH is around 900k). I still remember when it first crossed 10M monthly downloads about 5 years ago and I thought it couldn't possibly go any higher...
@fchollet
also all this: https://t.co/UvO0GnmPzX
@sama
Codex in the ChatGPT mobile app!
@sama
New course: Transformers in Practice. You'll get a practical view of how transformer-based LLMs work, so you can reason about their behavior, diagnose problems like slow inference, and make smarter decisions about deployment. This course is built in partnership with @AMD and htt
@AndrewYNg
This reminds me of computerization. The amount of "work" people could execute on computers increased by a huge factor, but their productivity did not. The amount of work "needed" to arrive at the same high-level outputs exploded.
@fchollet
The quantity of code that devs ship has roughly 10xed. But net developer productivity (value created by unit of time) is only up by a bit, if at all. Part of it is that the additional code is solving more incremental problems. A bigger part is that the new code is creating
@fchollet
being a dad is the thing that has most exceeded already-high-expectations in my whole life
@sama
Hacker News
Project Gutenberg – keeps getting better
798 pts · 182 comments · JSeiko
I believe there are entire companies right now under AI psychosis
1016 pts · 443 comments · reasonableklout
Additive Blending on the Nintendo 64
63 pts · 7 comments · ibobev
Ploopy Bean: a trackpoint for every computer
25 pts · 9 comments · jibcage
The main thing about P2P meth is that there's so much of it (2021)
79 pts · 68 comments · tomjakubowski
The bird eye was pushed to an evolutionary extreme
46 pts · 9 comments · sohkamyung
Naturally Occurring Quasicrystals
78 pts · 6 comments · lukeplato
SQL patterns I use to catch transaction fraud
40 pts · 2 comments · redbell
A 0-click exploit chain for the Pixel 10
349 pts · 165 comments · happyhardcore
Show HN: Epiq – Distributed Git based issue tracker TUI
32 pts · 8 comments · jolaflow
Project Gutenberg – keeps getting better
798 pts · 182 comments · JSeiko
I believe there are entire companies right now under AI psychosis
1016 pts · 443 comments · reasonableklout
Additive Blending on the Nintendo 64
63 pts · 7 comments · ibobev
Ploopy Bean: a trackpoint for every computer
25 pts · 9 comments · jibcage
The main thing about P2P meth is that there's so much of it (2021)
79 pts · 68 comments · tomjakubowski
The bird eye was pushed to an evolutionary extreme
46 pts · 9 comments · sohkamyung
Naturally Occurring Quasicrystals
78 pts · 6 comments · lukeplato
SQL patterns I use to catch transaction fraud
40 pts · 2 comments · redbell
A 0-click exploit chain for the Pixel 10
349 pts · 165 comments · happyhardcore
Show HN: Epiq – Distributed Git based issue tracker TUI
32 pts · 8 comments · jolaflow
Ask FikAi in Deep Dive: "Go touch some grass" for a live digest.
Updated 5/16/2026, 4:47:31 AM