Building thematic investment baskets with generative AI
July 2025Investors struggle to identify companies truly exposed to emerging megatrends. Static sector tags miss nuance, and manual research is slow and incomplete. With Bigdata.com’s generative AI research tools, thematic baskets can now be built dynamically and at scale.
What used to take weeks of manual tagging, reading, and curation can now be accomplished in minutes.
Use case example: tracking the Supply Chain Reshaping theme in the S&P 500
Traditional sector screens cannot detect company exposure to cross-sector megatrends like “Supply Chain Reshaping.” This use case shows how to mine earnings call transcripts to dynamically identify and rank companies based on true thematic relevance, without relying on static sector classifications.
Why it matters
Long-term megatrends continue to attract growing capital flows, with global thematic ETFs managing hundreds of billions of dollars in assets. However, identifying true company exposure within these themes often remains a manual and subjective process. Leveraging AI-driven thematic screening can accelerate idea generation and improve portfolio precision by automating and refining the selection of relevant companies.
This use case shows how to:
- Define a theme (e.g. supply chain reshaping)
- Generate a taxonomy of sub-themes (e.g. Nearshoring,robotics, diversification of suppliers)
- Search earnings transcripts using hybrid retrieval
- Label relevant chunks using LLMs
- Score and rank companies by thematic exposure
Check out our cookbook to build your own thematic baskets.
Workflow summary with Bigdata.com
1.Define your theme and scope.
Start by setting the theme (Supply Chain Reshaping), universe (S&P 500), time window (last 12 months), and analysis model (GPT-4o-mini).
2. Build a dynamic taxonomy.
Instead of relying on static sector tags, generate a living taxonomy of sub-themes. For example:
- Nearshoring to reduce transport costs
- Drones and robotics to streamline delivery and inventory
- Diversifying suppliers to reduce dependency and risk
3. Search transcripts with hybrid retrieval.
Comb through earnings calls using a blend of keyword, semantic, and cross-encoder search to surface the most relevant insights hidden within company narratives.
4. Validate thematic relevance with LLMs.
An LLM then acts as your analyst, confirming whether each mention truly addresses the theme, filtering out noise, and distinguishing core players from incidental mentions.
5. Score and rank exposure.
Finally, companies are scored and ranked based on how frequently and meaningfully they align with the theme. revealing who is best positioned in this megatrend.
What emerged
- A ranked list of the top 10 S&P 500 companies reshaping supply chains
- Sub-theme level scoring for deeper strategic insights
- Far greater precision than static keyword tagging alone


Good to know: this process is embedded in the Bigdata.com App, powering thematic baskets like Health & Wellness, Energy Transition, AI in Power Demand, Water Scarcity Solutions, Militarization, Precision Farming and more.
Ideal for
Thematic investors, ETF issuers, quant and discretionary managers, innovation analysts, strategy teams, and data scientists building explainable AI screeners.
Practical applications
- Idea generation at scale: Go from headlines to holdings in minutes.
- Portfolio construction: Build dynamic baskets aligned with emerging themes.
- Investment committee prep: Highlight companies best positioned in long-term trends.
- Thematic risk analysis: Assess portfolio exposure to macro narratives.
Technical stack
- Bigdata API for document access and hybrid search
- bigdata-research-tools for taxonomy creation, search at scale, and content validation labeling
- LLMs (e.g. GPT-4o-mini) to validate content relevance
- Vector storage for low-latency retrieval
Note: bigdata-research-tools is a Data Science toolkit for running miners, screeners, and custom workflows at scale, supporting both standard and advanced research use cases.
Unlike traditional screeners that rely only on keyword tagging or pre-set taxonomies, this method uses generative AI to create adaptive, narrative-driven taxonomies and validate them with LLM labeling, ensuring higher precision and relevance.
Ready to build your own thematic baskets?
Explore our cookbook for step-by-step notebooks and deploy scalable thematic screeners today.