AI HN来自 Hacker News 的 AI 新闻
EN
今天
3天
7天
30天
全部
52 · "fine-tuning"
每页
1
Compressed filesystems à la language models(grohan.co)
67 ·grohan·29 天前·14 评论
Inference OptimizationLLM Research
本文详细介绍了一个将语言模型训练为文件系统的项目,通过FUSE实现。作者利用文件系统模拟器生成训练数据,捕获FUSE操作和状态,然后微调模型以处理读写操作,输出内容或更新后的文件系统状态。
2
Antislop: A framework for eliminating repetitive patterns in language models(arxiv.org)
120 ·Der_Einzige·2 个月前·110 评论
AI SafetyInference Optimization
本文介绍了Antislop框架,用于检测和消除大型语言模型(LLM)输出中的重复模式(称为“slop”),这些模式会降低输出质量并使文本容易被识别为AI生成。该框架包含三项创新:用于推理时抑制的Antislop Sampler、用于分析slop并生成训练数据的自动化管道,以及Final Token Preference Optimization(FTPO)这一新颖的微调方法。FTPO在GSM8K和创意写作等任务中实现了90%的slop减少,同时保持或提升了性能,优于会降低输出质量的DPO方法。
3
The case for the return of fine-tuning(welovesota.com)
167 ·nanark·2 个月前·81 评论
这篇博客文章主张在大型语言模型(LLM)部署中重新采用微调技术,提到了Thinking Machines Labs的Tinker平台和Hugging Face向专业化模型转变等近期动态。它回顾了微调过去的重要性(如ULMFiT、BERT)及其在定制AI系统中回归的潜力。
4
Tinker by Thinking Machines(thinkingmachines.ai)
55 ·ashvardanian·3 个月前·2 评论
Code & DevelopmentAI Reasoning
本文介绍了Tinker,这是一个面向研究人员的训练API,可让他们控制模型训练和微调过程,同时由平台处理基础设施。它支持Qwen、Llama、GPT-OSS、DeepSeek和Moonshot等多种模型,并使用LoRA进行高效微调。
5
Announcing Tinker(thinkingmachines.ai)
152 ·pr337h4m·3 个月前·89 评论
Code & DevelopmentAI Reasoning
用于微调语言模型的灵活API Tinker已进入私人测试阶段。它是一项托管服务,负责处理分布式训练基础设施(调度、资源分配、故障恢复),并使用LoRA降低成本,支持Qwen-235B-A22B等开源权重模型。普林斯顿、斯坦福、伯克利和Redwood Research的研究人员已将其用于定理证明、化学推理和强化学习实验等任务。
6
Extract-0: A specialized language model for document information extraction(arxiv.org)
195 ·henriquegodoy·3 个月前·58 评论
这篇arXiv论文介绍了Extract-0,一个专为文档信息提取优化的70亿参数语言模型。该模型通过合成数据生成、LoRA微调以及GRPO强化学习等方法,在提取任务基准测试中优于GPT-4.1等更大规模的模型,同时消耗更少的计算资源。
7
LoRA Without Regret(thinkingmachines.ai)
184 ·grantpitt·3 个月前·58 评论
Training Methods
LoRA是针对大型语言模型的主流参数高效微调方法,通过更新较小的适配器矩阵而非全部权重来节省成本和提升速度。它支持多租户服务,允许单个推理服务器同时处理多个适配器版本,且被vLLM等现代推理引擎支持。
8
We Politely Insist: Your LLM Must Learn the Persian Art of Taarof(arxiv.org)
181 ·chosenbeard·3 个月前·122 评论
LLM Research
这篇被EMNLP 2025主会接收的ArXiv论文探讨了大语言模型(LLM)在应对文化特定沟通规范时的不足,重点研究波斯语中的“Taarof”(仪式化礼貌)。论文引入了TaarofBench基准数据集,包含450个经母语者验证的角色扮演场景。评估发现前沿LLM在适当使用Taarof方面的准确率比母语者低40-48%,且标准礼貌指标不足以衡量非西方规范。通过监督微调(提升21.8%)和直接偏好优化(DPO,提升42.3%),模型与文化期望的对齐度得到改善。此外,研究还包含母语者、文化传承者和非伊朗语使用者的人类研究以建立基线。
9
Llama-Factory: Unified, Efficient Fine-Tuning for 100 Open LLMs(github.com)
132 ·jinqueeny·3 个月前·19 评论
LLM ResearchRAG & Retrieval
Llama-Factory 是一个统一且高效的工具,用于对超过 100 个开源大语言模型进行微调,可在 GitHub 上获取。
10
Launch HN: RunRL (YC X25) – Reinforcement learning as a service(runrl.com)
71 ·ag8·3 个月前·22 评论
本文介绍了由YC支持的强化学习即服务平台RunRL。该平台允许用户使用自定义奖励函数优化AI模型,集成OpenAI、Anthropic等现有AI API,并获取H100 GPU等训练所需的计算资源。平台为研究者和开发者提供SDK,同时有自助和企业级定价选项。
11
Memory optimizations to reduce CPU costs(ayende.com)
56 ·jbjbjbjb·4 个月前·18 评论
这篇人工智能领域的新闻介绍了通过内存优化来降低CPU成本的相关举措。
12
Dispelling misconceptions about RLHF(aerial-toothpaste-34a.notion.site)
120 ·fpgaminer·4 个月前·32 评论
AI SafetyOpenAI Ecosystem
这则新闻旨在消除关于人类反馈强化学习(RLHF)的常见误解,阐明其在现代人工智能模型开发中的实际应用、局限性及核心机制。
13
DoubleAgents: Fine-Tuning LLMs for Covert Malicious Tool Calls(pub.aimind.so)
98 ·grumblemumble·4 个月前·30 评论
DoubleAgents项目旨在微调大型语言模型(LLMs)以执行隐蔽的恶意工具调用,即模型使用有害工具的意图被隐藏。
14
Fine-tuned small LLMs can beat large ones with programmatic data curation(tensorzero.com)
53 ·GabrielBianconi·5 个月前·11 评论
LLM Research
经过微调的小型大语言模型(LLMs)通过应用程序化数据整理,能够胜过大型语言模型。
15
Train a 70b language model at home (2024)(answer.ai)
71 ·amrrs·5 个月前·13 评论
Training Methods
这篇2024年的新闻聚焦于在家中训练700亿参数语言模型的主题。
16
Reinforcement Learning from Human Feedback (RLHF) in Notebooks(github.com)
72 ·ash_at_hny·6 个月前·1 评论
AI Safety
这则新闻聚焦于人类反馈强化学习(RLHF)在笔记本环境中的应用,探讨其在AI开发的交互式编码平台中的集成及实际用例。
17
Fine-tuning LLMs is a waste of time(codinginterviewsmadesimple.substack.com)
193 ·j-wang·7 个月前·88 评论
这篇新闻认为微调大型语言模型(LLMs)是浪费时间的。
18
When Fine-Tuning Makes Sense: A Developer's Guide(getkiln.ai)
157 ·scosman·7 个月前·62 评论
Inference OptimizationLLM Research
这篇面向开发者的指南阐述了在哪些场景和情况下对AI模型进行微调是切实可行且有益的方法。
19
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models(arxiv.org)
69 ·mfiguiere·8 个月前·9 评论
Inference OptimizationLLM Research
💡 The story is an ArXiv paper describing a new fine-tuning technique (Inference-Aware Fine-Tuning) for large language models, which aligns with the research category's focus on academic papers and new training methods.
20
Transformer Lab(transformerlab.ai)
170 ·jonbaer·9 个月前·14 评论
💡 Transformer Lab is a development tool designed for training and fine-tuning transformer models, which aligns with the 'tools' category under Engineering (focused on AI development frameworks and tools).
21
Launch HN: Augento (YC W25) – Fine-tune your agents with reinforcement learning
101 ·lmeierhoefer·9 个月前·60 评论
AI Agent
💡 The story describes Augento (YC W25) launching a fine-tuning service for AI agents using reinforcement learning, which is a development tool for building agentic systems, aligning with the tools category under engineering.
22
Fine-tune Google's Gemma 3(unsloth.ai)
226 ·tomdekan·9 个月前·78 评论
Google AI
💡 The story centers on fine-tuning Google's Gemma 3 model, and fine-tuning is explicitly listed as part of the 'infra' category which covers training-related tasks.
23
RTX 5090 cable overheats to 150 degrees Celsius(tomshardware.com)
89 ·LorenDB·11 个月前·4 评论
AI Chips
💡 The story discusses an overheating issue with the Nvidia RTX 5090 GPU, which is widely used for AI workloads like local model inference and fine-tuning, thus belonging to AI hardware.
24
How to Train an AI Image Model on Yourself(coryzue.com)
207 ·aberoham·11 个月前·42 评论
Image Generation
💡 The story focuses on training an AI image model using personal data, which involves fine-tuning—an activity explicitly categorized under the infra section (covering training and fine-tuning)
25
We fine-tuned Llama and got 4.2x Sonnet 3.5 accuracy for code generation(finecodex.com)
137 ·banddk·12 个月前·77 评论
Anthropic & ClaudeRAG & Retrieval
💡 The story focuses on fine-tuning the Llama model for enhanced code generation accuracy, which aligns with the infra category (covers fine-tuning activities under engineering).
26
Exploring LoRA – Part 1: The Idea Behind Parameter Efficient Fine-Tuning(medium.com)
166 ·aquastorm·大约 1 年前·15 评论
Training Methods
💡 The story focuses on LoRA, a parameter-efficient fine-tuning technique, which falls under the 'infra' category (includes fine-tuning methods like LoRA).
27
OpenAI Reinforcement Fine-Tuning Research Program(openai.com)
229 ·thm·大约 1 年前·61 评论
OpenAI Ecosystem
💡 The story focuses on OpenAI's research program dedicated to reinforcement fine-tuning, a key training method for AI models, which aligns with the 'research' category covering new training techniques.
28
PaliGemma 2: Powerful Vision-Language Models, Simple Fine-Tuning(developers.googleblog.com)
218 ·meetpateltech·大约 1 年前·27 评论
Google AIMultimodal AI
💡 The story announces the release of PaliGemma 2, a vision-language model, which aligns with the 'models' category as it involves a model release (per rule 3: model releases → 'models' even if from a company).
29
LoRA vs. Full Fine-Tuning: An Illusion of Equivalence(arxiv.org)
236 ·timbilt·大约 1 年前·53 评论
Training Methods
💡 The story is an ArXiv paper comparing LoRA and full fine-tuning, which are AI model training methods—this aligns with the research category's focus on academic papers and training techniques.
30
Using reinforcement learning and $4.80 of GPU time to find the best HN post(openpipe.ai)
217 ·kcorbitt·大约 1 年前·95 评论
AI Chips
💡 The story involves using reinforcement learning (a fine-tuning technique) and GPU compute time to optimize HN post ranking, which aligns with the infra category covering training/fine-tuning methods and compute resources.
第 1 / 2 页,共 52 条
📅周报
Hacker News|Powered by Doubao