AI Assisted Domain Name Selection for Threat Hunting

The use of domain name intelligence in cyber threat hunting is well-established, as is analyzing AI-driven domain generation algorithms (DGAs) to characterize cyber threats and adversaries.

However, the emergence of Generative AI (GenAI) has led to an escalation in the use of Adversarial AI by cyber threat actors. The impact of adversarial AI on defenders is dramatic. It is spawning more threat actors, enabling them to generate more convincing phishing attacks, and run more effective campaigns at faster tempo and greater volumes. Confronted with this new threat landscape, cyber defenders have no choice but to fight AI with AI.

In this post we will look at domain threat hunting from the analyst perspective and demonstrate AI-assisted solutions, enabled by DomainTools and our Generative AI toolset (4.0, Microsoft Copilot, and Our use case, which focuses on the VexTrio Cybercrime Affiliate Network, highlights the application of AI NLP (Natural Language Processing) by both cyber adversaries and cyber defenders, and shows how DomainTools and Gen-AI can increase the skills of threat-hunting teams. We conclude with a review and an outlook.

Use Case: VexTrio Background

This post was inspired by a review of an Infoblox report on the VexTrio cybercriminal affiliate program and network infrastructure. [1] With more than 70,000 domains, the VexTrio network is the largest cybercrime network that Infoblox tracks. In particular we were interested in VexTrio’s use of DDGA (Dictionary Domain Generation Algorithms), a subtype of DGA (Domain Generation Algorithm), which continuously generates new domain names for their expanding cybercrime infrastructure. 

Before discussing VexTrio, we need some context on the basics of domain names in cyber threat intelligence. Many, if not most, new domains are registered with malicious intent or for re-sale by domain squatters. Threat actors use DGAs to generate millions of random or semi-random domain names in order to make their malicious networks more resilient to take downs. DGA generated domains typically appear to be nonsensical and random, e.g., lminoeubybyvq[.]com. In contrast, domains generated by DDGAs contain one or more dictionary words and characters, e.g., clickanalytics208[.]com, with the words (or lexemes) hinting at potential motives.

There are many DGA variants as described by Cybereason and Akamai. [8,9] DomainTools uses machine learning classifiers to predict risk scores. [11]. The MITRE ATT&CK model lists DGAs as a sub-technique that can be used to characterize threat actors. [7] 

OSINT (Open Source Intelligence) analysts routinely gather data from many sources. Ideally, most sourcing is done programmatically through APIs and feeds to a Threat Intelligence Platform (TIP). It is also common for analysts to augment automated source collection with ad hoc source collection via traditional search and manual extraction and loading of indicators into their TIP. For ad hoc intelligence needs, GenAI provides significant advantages over traditional search. Figure 1 shows a dialog requesting intelligence on DDGA use by threat groups. While results always need to be verified, this example shows how GenAI can enhance CTI team operations by improving analyst productivity and advancing the knowledge and skills of analyst teams. 

Figure 1. ad hoc search (7-March-2024)

Returning to our VexTrio example, the next step involved building the intelligence collection and indicators for the VexTrio MaaS (malware-as-a-service) infrastructure which supports operations of over 60 affiliates or threat actors. The Infoblox report provided 17 domain indicators. As we wanted more indicators from other sources, we also ingested indicators for VexTrio affiliates from three other trusted sources: CIS – Center for Internet SecurityAlienVault OTX, and Sucuri. [3-5] All indicators were associated with fake browser update campaigns and most were associated with the SocGholish and ClearFake affiliates and malware. Table 1 shows  a summary of the 93 indicators by source and type, compiled using DomainTools.

Table 1. VexTrio Affiliate Domain Indicators generated from DomainTools

Threat Hunting Findings

Our analysis thus far has focused on historical data – a compilation of known threats primarily observed in the past year. While historical data is useful for pattern analysis and threat characterization, it does not meet our needs for proactive threat hunting. 

To build our threat hunting dataset, I used ChatGPT to perform basic NLP tasks on the 93 identified malicious domains. This included extracting all the base-form words within the domains, generating a frequency distribution table, and creating a word cloud. The ChatGPT results of this are shown in Figure 2. From this list notice that the words ‘bonus’ and ‘prize’ were the two dominant words generated by the VexTrio DDGA. 

Figure 2. Chat GPT Frequency Distribution for word forms in DDGA generated domains

The Infoblox report on VexTrio provided a word cloud to graphically represent the DDGA words (Figure 17 in the report) [1]. Word clouds are an excellent visualization tool for threat hunting, helping analysts quickly spot meaningful terms or trends within a dataset. For small data collections, word clouds are unncessary as the analyst can quickly eyeball the prominent words from a table. But for dealing with large data collections, like the 17,000 VexTrio words in the GitHub CSV [2] or the 16M rows available in ExtraHop’s DGA GitHub [10], word clouds can be a great help. For more on word clouds see Insightsoftware overview [12], or just ask ChatGPT. 

Generating a word cloud from the 93 VexTrio in collection turned out to be an easy task for ChatGPT as shown in Figure 3. At a glance we can see that ‘Bonus’ and ‘Prize’ are the dominant terms based on word frequency. 

Figure 3. Chat GPT Word cloud filtered domain results

The final step in our domain threat hunting use case was to create a new data collection of domains that contain the dominant words ‘bonus’ and ‘prize’. It was easy to create the threat hunting collection using the Advanced Search filters in DomainTools Iris. In the left frame of Figure 4, we used the Advance Search settings to query for domains containing the word ‘prize’ or ‘bonus’, that were first seen in the last 7 days, and have a risk score greater than or equal to 90. This query generated 131 records. As shown in the right panel, the words ‘prize’ or ‘bonus’ are contained in all domains. 

Figure 4. Filtering domains containing ‘Prize’ OR ‘Bonus’

Review and Outlook

Here’s a quick recap of what we demonstrated in the use case: 

  • We compiled a small data collection by manually extracting domain indicators from four trusted sources on VexTrio and its affiliates.
  • We enriched the dataset with metadata using DomainTools, resulting in a detailed historical record and baseline of VexTrio-related activity (93 records with extensive attributes).
  • We used GenAI to research DDGA use by actor, and to generate a word frequency distribution and word cloud to help the analyst understand the data.
  • We used DomainTools Iris Advanced Search to create a proactive threat hunting data set based on the dominant word frequencies from the DDGA. For more on DomainTools for threat hunting see [17].

Two important additional uses that we did not demonstrate:

  • By using the enriched intelligence from DomainTools, we can gain insight into our patterns (temporal, country, hosting providers, registrars). These insights help analysts associate indicators with actors, and support take downs of the adversary networks through domain registrars and hosting providers.
  • We could also set up monitors using DomainTools to continuously monitor new matches for these search terms.

Looking ahead, adversarial AI services like WormGPT are actively promoted on the surface and dark web. [13-16] More threat actors will use these offerings. Phishing campaigns will be more convincing even for less-skilled actors. Misinformation and disinformation campaigns, enabled by better text and deep fakes, will be more convincing and voluminous. As Adversarial AI has escalated from a theoretical to a real threat, defenders are compelled to adopt Defensive AI services. But before defensive AI can be effective at scale and for automation, solution providers will need to improve their performance by resolving hallucination issues. Stay tuned for future posts on Adversarial AI. 


  2. Infoblox GitHub – threat-intelligence domain indicators csv , 4-March-2024
  3. CIS: Center for Internet Security – CTAs Leveraging Fake Browser Updates in Malware Campaigns , 8-Feb-2024
  4. SUCURIblog – New Wave of SocGholish Infections Impersonates WordPress Plugins ,  1-March-2024
  5. AlienVault – VexTrio, “tag:Vextrio , 10-Feb-2024 
  6. Proofpoint – Are You Sure Your Browser is Up to Date? The Current Landscape of Fake Browser Updates , 17-Oct-2023
  7. MITRE ATT&CK: ID: T1568.002 – Dynamic Resolution: Domain Generation Algorithms , query date 5-March-2024
  8. Cybereason – Domain Generation Algorithm Variants , 2016
  9. Akamai – DGA Families with Dynamic Seeds: Unexpected Behavior in DNS Traffic , 6-Sept-2024
  10. ExtraHop Networks Blog – ExtraHop Shares Huge Dataset for Detecting Domains Generated by Algorithm on GitHub , 12-Sept-2023
  11. DomainTools – Domain Blooms: Identifying Domain Name Themes Targeted By Threat Actors , 13-May-2021
  12. Insightsoftware – Visualizing Text Analysis Results with Word clouds, 4-May-2023
  13. – Meet the Brains Behind the Malware-Friendly AI Chat Service ‘WormGPT’ , 8-Aug-2023
  14. Cyberint – A.I – Trick or T(h)reat?  , 4-Oct-2023
  15. Darktrace – Navigating a New Threat Landscape: Breaking Down the AI Kill Chain , 6-Sept-2023
  16. Cloudflare – Dispelling the Generative AI fear: how Cloudflare secures inboxes against AI-enhanced phishing , 4-March-2024
  17. DomainTools – Leveraging Domain Intelligence for Threat Hunting  , 19-July-2023

Leave a Comment

Your email address will not be published. Required fields are marked *

Pin It on Pinterest