Georgia Tech researchers have designed the first benchmark that tests how well existing AI tools can interpret advice from YouTube financial influencers, also known as finfluencers.
Lead author Michael Galarnyk, Ph.D. Machine Learning ’28, joined lead authors Veer Kejriwal, B.S. Computer Science ’25, and Agam Shah, Ph.D. Machine Learning ’26, along with co-authors Yash Bhardwaj, École Polytechnique, M.S. Trustworthy and Responsible AI ‘27; Nicholas Meyer, B.S. Electrical and Computer Engineering ’22 and Quantitative and Computational Finance ’24; Anand Krishnan, Stanford University, B.S. Computer Science ‘27; and, Sudheer Chava, Alton M. Costley Chair and professor of Finance at Georgia Tech.
Aptly named VideoConviction, the multimodal benchmark included hundreds of video clips. Experts labelled each clip with the influencer’s recommendation (buy, sell, or hold) and how strongly the influencer seemed to believe in their advice, based on tone, delivery, and facial expressions. The goal? To see how accurately AI can pick up on both the message and the conviction behind it.
“Our work shows that financial reasoning remains a challenge for even the most advanced models,” said Galarnyk. “Multimodal inputs bring some improvement, but performance often breaks down on harder tasks that require distinguishing between casual discussion and meaningful analysis. Understanding where these models fail is a first step toward building systems that can reason more reliably in high stakes domains.”
All the numbers and hours of content came together to provide some surprising results. "While multimodal inputs improved ticker extraction (e.g., extracting Apple's ticker AAPL),” explained Shah, “both text based and multimodal models struggled to identify whether an influencer was actually making a buy or sell recommendation, often misclassifying general commentary as definitive recommendations.”
Key Takeaways From Portfolio Analysis
- Doing the opposite of what finfluencers recommend, like selling when they say to buy, led to better returns than simply investing in the S&P 500 index. This “inverse strategy” beat the market by 6.8% annually. However, it was riskier overall. For comparison, a popular tech-focused fund (QQQ) offered a smoother ride with better risk-adjusted performance.
- Conviction doesn’t guarantee success. Finfluencers who sounded more sure of their stock picks by using strong tone, detailed reasoning, and expressive delivery did better than those who seemed unsure. But even their high-conviction recommendations still didn’t perform as well as a simple investment in a tech-focused index fund like QQQ.
- Even the smartest AI still has trouble reading between the lines. The most advanced models that analyze both video and text couldn’t match human-level understanding when it came to spotting how confident a finfluencer was or telling the difference between real investment advice and casual commentary. In short, AI still struggles to distinguish real recommendations from general commentary.
- Short clips work better than full videos. When AI models were given shorter, focused segments of financial influencer videos, they did a better job understanding the advice and picking up on key details. Breaking things down helped the models stay on track.
With the academic paper garnering 850 downloads and over 6,000 abstract views, investors are eager to know who they can trust when it comes to financial advice. Proceed with caution: AI still can’t match human understanding of finfluencer content – an important gap as social media increasingly shapes retail investment decisions.
“Social media is rapidly reshaping how individuals engage with financial markets,” said Chava. “This research provides timely evidence on the influence of online financial content and offers valuable insights into the real-world consequences of following investment advice from digital platforms.”
