Skill: HG Data Query Discipline

Turn ICP criteria into ClickHouse queries that return clean data on the first try — safe patterns, verified names, and enforced limits.

Overview

Teach Claude to translate natural-language ICP criteria into safe, efficient ClickHouse SQL via hg_data_query. Covers table schemas, JOIN patterns, safe vs. dangerous SQL constructs, LIMIT enforcement, and common query templates for technographic, firmographic, spend, and category-sprawl detection.

Use cases

  • ABM list building that doesn't blow your credit budget

    Every hg_data_query call follows safe ClickHouse patterns — CTE + NOT IN for exclusion, enforced LIMIT, ORDER BY intensity DESC. No unbounded queries, no HTTP 500s from NOT EXISTS, no LEFT JOIN false-zeroes.

  • ICP queries a data engineer would approve

    Claude verifies every product name against hg_catalog and list_product_categories before querying. Joins install_global to company_locations on url_id with is_ghq = true. Groups by url_id with HAVING for sprawl detection. The SQL reads like it was written by someone who knows ClickHouse.

View full skill

HG Data Query Discipline

When to use

  • Building account lists from ICP criteria via hg_data_query
  • Any workflow step that writes ClickHouse SQL against HG tables
  • Translating natural-language filters into structured queries

Pre-flight: always verify names first

Before writing any SQL, call these tools to resolve names to exact values:

  1. hg_catalog — discover available tables, columns, and join keys
  2. list_product_categories — verify category names (e.g., "IT Service Desk", not "ITSM")
  3. get_vendor_information — look up vendor/product IDs and exact names

Never guess a category, vendor, or product name. A typo returns zero rows silently.

Key tables

TableRowsJoin keyUse
install_global299Murl_idCurrent product installs — technographic queries
company_locations3.9Murl_idFirmographic data (revenue, employees, industry, country)
install_intensity_momentum_global800Murl_idRate of change (intensity_momentum)
install_intensity_trending_global500Murl_idTrending direction (intensity_trending)
install_time_series_global5.5Burl_idMonthly snapshots (weighted_score x date_verified)
spend_absolute20Murl_idIT spend data
products96Kproduct_idProduct catalog

Standard JOIN: install_global ig JOIN company_locations cl ON ig.url_id = cl.url_id with cl.is_ghq = true (global HQ only — avoids duplicate locations).

Safe ClickHouse patterns

CTE + NOT IN (has X but not Y)

WITH has_sap AS (
  SELECT DISTINCT url_id FROM install_global
  WHERE vendor_name = 'SAP SE'
)
SELECT cl.name, cl.country_name, cl.revenue_min
FROM install_global ig
JOIN company_locations cl ON ig.url_id = cl.url_id
WHERE ig.vendor_name = 'Oracle Corporation'
  AND cl.is_ghq = true
  AND ig.url_id NOT IN (SELECT url_id FROM has_sap)
ORDER BY ig.intensity DESC
LIMIT 100

Category sprawl detection

SELECT cl.name, cl.url_id,
       count(DISTINCT ig.vendor_id) AS vendor_count,
       groupArray(DISTINCT ig.vendor_name) AS vendors
FROM install_global ig
JOIN company_locations cl ON ig.url_id = cl.url_id
WHERE ig.category_leaf_name = 'IT Service Desk'
  AND cl.is_ghq = true
  AND cl.revenue_min >= 1000000000
GROUP BY cl.name, cl.url_id
HAVING vendor_count >= 3
ORDER BY vendor_count DESC
LIMIT 100

Multi-value aggregation

Use groupArray(DISTINCT column) + has() for multi-value checks within a GROUP BY.

CASE for segment classification

SELECT cl.name,
  CASE
    WHEN vendor_count >= 5 THEN 'Severe sprawl'
    WHEN vendor_count >= 3 THEN 'Moderate sprawl'
    ELSE 'Dual-stack'
  END AS severity
FROM ...

Dangerous patterns — NEVER use

PatternProblemFix
NOT EXISTS subqueryReturns HTTP 500 on ClickHouseUse CTE + NOT IN
LEFT JOIN for exclusionReturns 0 rows unreliablyUse CTE + NOT IN
Query without LIMITCan return 10K+ rows, blow token budgetAlways include LIMIT (default 100)
Guessed product/category nameReturns 0 rows silentlyVerify via hg_catalog / list_product_categories first

LIMIT enforcement

  • Always include a LIMIT clause. Default: 100. Max: 200.
  • Use ORDER BY intensity DESC (or max_intensity DESC in GROUP BY) so the most relevant accounts surface first within the limit.
  • For momentum queries, ORDER BY intensity_momentum ASC (most declining first).

Credit budget

ToolCredits
hg_data_queryDynamic (reported by upstream API per query)
hg_catalog0 (free metadata lookup)
get_vendor_information0 (free catalog lookup)
list_product_categories0 (free catalog lookup)

A typical list-build is 4–8 queries (hg_catalog + name verification + 2–4 data queries). Catalog lookups are free; hg_data_query credits vary by query complexity.

Reference