A groundbreaking study by researchers at the University of Chicago has revealed that OpenAI’s GPT-4 can outperform human analysts in predicting future earnings based on financial statement analysis. This discovery has significant implications for the future of financial analysis and decision-making, as it demonstrates the potential for large language models to augment and streamline the work of financial professionals.
GPT-4 Prediction Accuracy
- GPT-4 achieved a prediction accuracy of 60.35% in determining the direction of future earnings, surpassing the 52.71% accuracy of human analysts. The model also outperformed analysts in terms of F1-score, which balances precision and recall, with GPT-4 scoring 60.90% compared to 54.48% for human analysts.
- The study utilized anonymized financial data from the Compustat database, spanning from 1968 to 2021, and compared GPT-4’s performance to human analysts’ predictions sourced from the IBES database. By removing company names and dates from the standardized financial statements provided to GPT-4, the researchers ensured a fair comparison between the model and human analysts.
Comparison with ANNs
- GPT-4’s performance was commensurate with advanced machine learning models, such as artificial neural networks (ANNs), specifically designed for earnings predictions. In certain aspects, GPT-4 even outperformed these specialized models, demonstrating its robustness in financial analysis.
- The researchers compared GPT-4 with state-of-the-art ANNs and found that the language model generated performance on par with these narrowly specialized applications. This finding highlights the potential for general-purpose AI to rival or surpass purpose-built models in complex analytical tasks.
Chain of Thought Methodology
- The researchers employed a “Chain of Thought” (CoT) reasoning approach with GPT-4, mimicking the analytical steps a human analyst would take. This method involved identifying changes in financial statements, computing key financial ratios, and synthesizing this information to predict earnings trends.
- The CoT prompts played a pivotal role in helping GPT-4 identify trends, compute financial ratios, and synthesize information akin to a human analyst. This approach allowed GPT-4 to analyze and generate accurate predictions even when given raw financial data devoid of context.
Skepticism and Challenges
- Despite the promising results, some skepticism remains. Critics have questioned the validity of comparing GPT-4’s performance with that of human analysts and specialized ANNs, pointing out potential differences in the complexity of tasks and the models used for comparison.
- The study acknowledges the difficulty in pinpointing exactly how and why GPT-4 performs well, highlighting the challenges in understanding the inner workings of large language models. As AI researcher Matt Holden noted, it is unlikely that GPT-4 could select stocks that outperform broader indexes like the S&P 500.