While the prevailing wisdom in AI has been upgrade to a better model to get better results, a new study from MIT Sloan affiliates suggests that only half of the performance boost from switching to a more advanced AI was due to the model itself. The other half came from something much more challenging to acquire: how people adapted their prompts.
“People often assume that better results come mostly from better models,” David Holtz, Columbia University assistant professor and research affiliate at the MIT Initiative on the Digital Economy (MIT, 2025), said in a large-scale study. “The fact that nearly half the improvement came from user behavior really challenges that belief.”
Nearly 1,900 participants were asked to recreate a reference image, such as a photo, graphic, or artwork, using OpenAI’s DALL-E. They were randomly assigned to one of three setups:
- DALL-E 2 (baseline)
- DALL-E 3 (more advanced)
- DALL-E 3 with their prompts automatically rewritten by GPT-4, without their knowledge.
Each person had 25 minutes to submit at least 10 prompts, with a bonus payment for the top 20 percent of performers. The time pressure and reward structure encouraged iteration, experimentation, and competitive refinement.
DALL-E 3 beat DALL-E 2 in producing images closer to the reference, but only half of that improvement was due to the model itself. The rest came from how people instinctively modified their prompting style: writing prompts that were 24% longer, more descriptive, and more similar in structure to one another. Besides, the skill wasn’t tied to coding knowledge. In fact, the best prompters were often people who could communicate visual ideas clearly in everyday language.
But the biggest surprise was the third group: those whose prompts were automatically rewritten by GPT-4. While intended to improve results, the rewrites degraded performance by 58 percent compared to the baseline DALL-E 3 group. That’s because the system often added details users didn’t intend or subtly changed the meaning, steering the AI away from the target image.
Eaman Jahani, an assistant professor at the University of Maryland and a digital fellow at the MIT Initiative, noted that users who started at the lower end of performance improved the most, narrowing the gap between novices and experts. This suggests that, when combined with user adaptation, generative AI can actually reduce inequality in output.
AI performance is shaped as much by human adaptability as by machine capability. In other words, the next big leap in results might not come from the next big model, but from the people learning to speak its language.