Skill: HG Data Query Discipline
Turn ICP criteria into ClickHouse queries that return clean data on the first try — safe patterns, verified names, and enforced limits.
Overview
Teach Claude to translate natural-language ICP criteria into safe, efficient ClickHouse SQL via hg_data_query. Covers table schemas, JOIN patterns, safe vs. dangerous SQL constructs, LIMIT enforcement, and common query templates for technographic, firmographic, spend, and category-sprawl detection.
Use cases
ABM list building that doesn't blow your credit budget
Every hg_data_query call follows safe ClickHouse patterns — CTE + NOT IN for exclusion, enforced LIMIT, ORDER BY intensity DESC. No unbounded queries, no HTTP 500s from NOT EXISTS, no LEFT JOIN false-zeroes.
ICP queries a data engineer would approve
Claude verifies every product name against hg_catalog and list_product_categories before querying. Joins install_global to company_locations on url_id with is_ghq = true. Groups by url_id with HAVING for sprawl detection. The SQL reads like it was written by someone who knows ClickHouse.
View full skill
HG Data Query Discipline
When to use
- Building account lists from ICP criteria via
hg_data_query - Any workflow step that writes ClickHouse SQL against HG tables
- Translating natural-language filters into structured queries
Pre-flight: always verify names first
Before writing any SQL, call these tools to resolve names to exact values:
hg_catalog— discover available tables, columns, and join keyslist_product_categories— verify category names (e.g., "IT Service Desk", not "ITSM")get_vendor_information— look up vendor/product IDs and exact names
Never guess a category, vendor, or product name. A typo returns zero rows silently.
Key tables
| Table | Rows | Join key | Use |
|---|---|---|---|
install_global | 299M | url_id | Current product installs — technographic queries |
company_locations | 3.9M | url_id | Firmographic data (revenue, employees, industry, country) |
install_intensity_momentum_global | 800M | url_id | Rate of change (intensity_momentum) |
install_intensity_trending_global | 500M | url_id | Trending direction (intensity_trending) |
install_time_series_global | 5.5B | url_id | Monthly snapshots (weighted_score x date_verified) |
spend_absolute | 20M | url_id | IT spend data |
products | 96K | product_id | Product catalog |
Standard JOIN: install_global ig JOIN company_locations cl ON ig.url_id = cl.url_id with cl.is_ghq = true (global HQ only — avoids duplicate locations).
Safe ClickHouse patterns
CTE + NOT IN (has X but not Y)
WITH has_sap AS (
SELECT DISTINCT url_id FROM install_global
WHERE vendor_name = 'SAP SE'
)
SELECT cl.name, cl.country_name, cl.revenue_min
FROM install_global ig
JOIN company_locations cl ON ig.url_id = cl.url_id
WHERE ig.vendor_name = 'Oracle Corporation'
AND cl.is_ghq = true
AND ig.url_id NOT IN (SELECT url_id FROM has_sap)
ORDER BY ig.intensity DESC
LIMIT 100
Category sprawl detection
SELECT cl.name, cl.url_id,
count(DISTINCT ig.vendor_id) AS vendor_count,
groupArray(DISTINCT ig.vendor_name) AS vendors
FROM install_global ig
JOIN company_locations cl ON ig.url_id = cl.url_id
WHERE ig.category_leaf_name = 'IT Service Desk'
AND cl.is_ghq = true
AND cl.revenue_min >= 1000000000
GROUP BY cl.name, cl.url_id
HAVING vendor_count >= 3
ORDER BY vendor_count DESC
LIMIT 100
Multi-value aggregation
Use groupArray(DISTINCT column) + has() for multi-value checks within a GROUP BY.
CASE for segment classification
SELECT cl.name,
CASE
WHEN vendor_count >= 5 THEN 'Severe sprawl'
WHEN vendor_count >= 3 THEN 'Moderate sprawl'
ELSE 'Dual-stack'
END AS severity
FROM ...
Dangerous patterns — NEVER use
| Pattern | Problem | Fix |
|---|---|---|
NOT EXISTS subquery | Returns HTTP 500 on ClickHouse | Use CTE + NOT IN |
LEFT JOIN for exclusion | Returns 0 rows unreliably | Use CTE + NOT IN |
Query without LIMIT | Can return 10K+ rows, blow token budget | Always include LIMIT (default 100) |
| Guessed product/category name | Returns 0 rows silently | Verify via hg_catalog / list_product_categories first |
LIMIT enforcement
- Always include a LIMIT clause. Default: 100. Max: 200.
- Use
ORDER BY intensity DESC(ormax_intensity DESCin GROUP BY) so the most relevant accounts surface first within the limit. - For momentum queries,
ORDER BY intensity_momentum ASC(most declining first).
Credit budget
| Tool | Credits |
|---|---|
hg_data_query | Dynamic (reported by upstream API per query) |
hg_catalog | 0 (free metadata lookup) |
get_vendor_information | 0 (free catalog lookup) |
list_product_categories | 0 (free catalog lookup) |
A typical list-build is 4–8 queries (hg_catalog + name verification + 2–4 data queries). Catalog lookups are free; hg_data_query credits vary by query complexity.
Reference
hg-insights-api.md#hg_data_query— tool documentationhg-credit-awareness— credit budget management