Skip to main content

HG Data Query

Execute read-only SQL SELECT queries against the HG Insights data warehouse with billions of rows of technographic, intent, and spend data.

Data Coverage

MetricValueSource
Warehouse rowsBillionsPhoenix MCP tools overview: "data warehouse with billions of rows of technographic, intent, and spend data". Verified 2026-05-11.

Run read-only SQL against the HG Insights data warehouse.

Parameters

NameTypeRequiredDefaultDescription
querystringYes-Read-only SQL query to execute. Must start with SELECT or WITH.
max_rowsnumberNo1000Maximum number of rows to return (1-10,000).

Required Integrations

  • hginsights_v2

Credit Cost

Credits are dynamic1 credit per row returned by the query. The actual cost is reported by the upstream HG Insights API in the credits_consumed field of each response.

  • A query returning 100 rows costs 100 credits
  • A query returning 0 rows costs 0 credits
  • hg_catalog is free (0 credits)

Available Tables

Use hg_catalog to discover the full schema. Key tables include:

TableApproximate RowsDescription
company_locations~30MCompany lookup and firmographics
install_global~296MCurrent product installations
install_time_series_global~27BHistorical verification events
install_intensity_by_month_global~3.9BMonthly intensity scores
install_intensity_momentum_global~102MUsage acceleration
install_intensity_trending_global~81MUsage trending
intent_global~7.3BBuyer intent signals
spend_absolute-IT spend data

Most tables can be joined to company_locations via the url_id column.

Security Guardrails

  • SELECT only: Queries must start with SELECT or WITH. DDL/DML keywords (INSERT, UPDATE, DELETE, DROP, CREATE, ALTER, etc.) are blocked.
  • Single statement: Semicolons are rejected to prevent multi-statement payloads.
  • Timeout: Queries that do not complete within 30 seconds are aborted.
  • Row cap: Maximum 10,000 rows per query (enforced via max_rows parameter).
  • The upstream API may impose additional restrictions beyond client-side validation.

Use Cases

  • Custom analytics across the full HG data warehouse
  • Cross-table joins not available through individual MCP tools
  • Aggregation queries (COUNT, SUM, AVG) across large datasets
  • Time-series analysis of technology adoption trends
  • Building custom company lists with complex filtering criteria
  • Total Addressable Market (TAM) / market sizing — see below

TAM / Market-Sizing

hg_data_query is the way to answer TAM-shaped questions about a specific product or vendor. The customer-data query layer already supports the underlying capability; the flow is anchored on the customer's own product/vendor (TAM is "who could buy my product?") and routes through get_vendor_information to resolve product IDs and their categoryIds in a single call.

There are three TAM shapes:

  • installed_base — companies that already use the anchor product. "How many mid-market NA companies installed Splunk?"
  • competitive_displacement — companies on a different vendor in the same category as the anchor. "Companies running a SIEM that's not Splunk."
  • whitespace — companies in the category with no vendor at all. "Mid-market companies with no SIEM yet."

The flow:

  1. Resolve the anchor via get_vendor_information(vendorName: "[product]"). Capture the product IDs and the distinct categoryId values across those products. The categoryIds define the implied competitive set, so there is no need to enumerate every competitor's product manually.
  2. Discover the warehouse tables via hg_catalog (free, 0 credits). Identify the install table (install_global), the firmographic table (company_locations), and the join key (url_id).
  3. Compose the SQL using one of the three patterns below.
  4. Execute via hg_data_query and present the result.

Pattern A — installed_base

SELECT COUNT(DISTINCT cl.company_id) AS company_count
FROM install_global ig
JOIN company_locations cl USING (url_id)
WHERE ig.product_id IN (:anchorProductIds)
AND cl.country IN ('US', 'CA') AND cl.employees BETWEEN 100 AND 1000;

Pattern B — competitive_displacement

The category_id filter avoids any need for a category-filtered vendor lookup.

SELECT DISTINCT cl.company_id, cl.company_name, cl.country
FROM install_global ig
JOIN company_locations cl USING (url_id)
WHERE ig.category_id IN (:anchorCategoryIds)
AND ig.product_id NOT IN (:anchorProductIds)
AND cl.country IN ('US', 'CA')
LIMIT 1000;

Pattern C — whitespace

SELECT cl.company_id, cl.company_name
FROM company_locations cl
WHERE cl.country IN ('US', 'CA') AND cl.employees BETWEEN 100 AND 1000
AND cl.url_id NOT IN (
SELECT url_id FROM install_global WHERE category_id IN (:anchorCategoryIds)
)
LIMIT 1000;

Example Usage

Basic company lookup

{
"tool": "hg_data_query",
"parameters": {
"query": "SELECT url_id, company_name, country, employee_count FROM company_locations WHERE company_name ILIKE '%Salesforce%' LIMIT 10"
}
}

Find top technologies at a company

{
"tool": "hg_data_query",
"parameters": {
"query": "SELECT ig.product_name, ig.vendor_name, ig.intensity FROM install_global ig JOIN company_locations cl ON ig.url_id = cl.url_id WHERE cl.company_name ILIKE '%Cisco%' ORDER BY ig.intensity DESC LIMIT 20"
}
}

Aggregate intent signals by topic

{
"tool": "hg_data_query",
"parameters": {
"query": "SELECT topic, COUNT(*) as signal_count FROM intent_global WHERE url_id IN (SELECT url_id FROM company_locations WHERE country = 'US' AND employee_count > 1000) GROUP BY topic ORDER BY signal_count DESC LIMIT 10",
"max_rows": 10
}
}

Monthly intensity trend

{
"tool": "hg_data_query",
"parameters": {
"query": "SELECT month, avg_intensity FROM install_intensity_by_month_global WHERE url_id = 12345 AND product_name = 'Snowflake' ORDER BY month DESC LIMIT 12"
}
}

Response Format

{
"rows": [
{ "url_id": 12345, "company_name": "Acme Corp", "country": "US" }
],
"columns": ["url_id", "company_name", "country"],
"row_count": 1,
"credits_consumed": 1
}
  • rows: Array of row objects with column names as keys
  • columns: Array of column names returned by the query
  • row_count: Number of rows returned
  • credits_consumed: Actual credits consumed (dynamic, reported by upstream API)

Error Handling

Common errors:

  • Invalid SQL: Query rejected if it contains disallowed keywords or doesn't start with SELECT/WITH
  • Query rejected by warehouse: The upstream API returns details about SQL syntax errors or invalid table/column references
  • Timeout: Query did not complete within 30 seconds
  • Authentication: HG Insights API key not configured or invalid
  • Rate limit: API rate limit exceeded

Best Practices

  1. Always call hg_catalog first to discover available tables and columns before writing queries
  2. Use LIMIT to control row count and credit consumption
  3. Join via url_id — this is the primary key connecting most tables to company_locations
  4. Start small — test queries with LIMIT 10 before running larger queries
  5. Use specific columns instead of selecting all columns to improve query performance

Next Steps