Skip to main content

Configuration

HarborSQL reads configuration from environment variables.

Unity Catalog Permissions

HarborSQL is designed for teams that already have Unity Catalog Delta tables and existing Databricks principals that can read those tables.

The HarborSQL-specific grant is EXTERNAL USE SCHEMA. Grant it on each schema you want to query from HarborSQL:

GRANT EXTERNAL USE SCHEMA ON SCHEMA <catalog>.<schema> TO `<principal>`;

This privilege lets Unity Catalog vend temporary table credentials to an external engine. It does not replace the normal Unity Catalog read permissions. The same principal still needs the permissions it would need to read the table through Databricks:

GRANT USE CATALOG ON CATALOG <catalog> TO `<principal>`;
GRANT USE SCHEMA ON SCHEMA <catalog>.<schema> TO `<principal>`;
GRANT SELECT ON TABLE <catalog>.<schema>.<table> TO `<principal>`;

SELECT may also be inherited from a schema or catalog grant. HarborSQL does not need static cloud credentials; Unity Catalog vends temporary table credentials at query time.

Environment Variables

VariableRequiredDefaultDescription
HARBORSQL_DATABRICKS_HOST or DATABRICKS_HOSTyesnoneDatabricks workspace URL or host; defaults to https:// when no scheme is supplied and rejects http:// unless explicitly allowed
HARBORSQL_UNSAFE_ALLOW_HTTP_DATABRICKS_HOSTnofalseAllows an http:// Databricks host value for local non-Databricks test endpoints only; do not use with real Databricks bearer tokens
HARBORSQL_BIND_ADDRno127.0.0.1:1992HTTP bind address
HARBORSQL_DEFAULT_CATALOG or DATABRICKS_CATALOGnoworkspaceDefault catalog for unqualified queries
HARBORSQL_DEFAULT_SCHEMA or DATABRICKS_SCHEMAnodefaultDefault schema for unqualified queries
HARBORSQL_AWS_REGIONnous-west-2AWS region passed to Delta object-store access
HARBORSQL_MAX_RESULT_ROWSno100000Maximum rows HarborSQL will materialize for one query; set to an empty value to disable
HARBORSQL_MAX_RESULT_BYTESno67108864Maximum retained Arrow result page bytes HarborSQL will materialize for one query; set to an empty value to disable
HARBORSQL_UNITY_TIMEOUT_SECONDSno30Timeout for Unity Catalog HTTP requests
HARBORSQL_QUERY_TIMEOUT_SECONDSno300Timeout for each query execution
HARBORSQL_IDLE_SESSION_TIMEOUT_SECONDSno1800Idle timeout for Thrift sessions
HARBORSQL_COMPLETED_OPERATION_TTL_SECONDSno600Retention time for completed Thrift operations and their materialized results
HARBORSQL_CLEANUP_INTERVAL_SECONDSno60Background cleanup interval for expired sessions and operations
HARBORSQL_MAX_SESSIONSno256Maximum concurrent Thrift sessions
HARBORSQL_MAX_OPERATIONSno512Maximum retained Thrift operations
HARBORSQL_REQUEST_BODY_LIMIT_BYTESno1048576Maximum HTTP request body size
HARBORSQL_PARQUET_PUSHDOWN_FILTERSnotrueEnable DataFusion Parquet filter pushdown and late materialization
HARBORSQL_PARQUET_REORDER_FILTERSnosame as HARBORSQL_PARQUET_PUSHDOWN_FILTERSReorder pushed-down Parquet filters heuristically
HARBORSQL_TARGET_PARTITIONSnomax of available CPU parallelism and 32DataFusion target partition count
HARBORSQL_SKIP_PARTIAL_AGGREGATION_PROBE_ROWS_THRESHOLDno10000Rows per partition DataFusion samples before bypassing partial aggregation for high-cardinality group keys
HARBORSQL_SKIP_PARTIAL_AGGREGATION_PROBE_RATIO_THRESHOLDno0.8Distinct-groups/input-rows ratio that triggers partial aggregation bypass
HARBORSQL_TABLE_CACHE_TTL_SECONDSno300Maximum lifetime for token-scoped cached table providers; set to 0 to disable
HARBORSQL_TABLE_CACHE_MAX_ENTRIESno1024Maximum token/table/region cache entries; set to 0 to disable
HARBORSQL_UNSAFE_LOG_SQLnofalseInclude redacted SQL text in internal tracing spans for controlled debugging; SQL is omitted from logs by default
DATABRICKS_TOKENquery mode onlynoneToken used by harborsql query --sql ...

Safety Defaults

  • Real Databricks workspaces must use HTTPS.
  • SQL text is not logged by default.
  • Result materialization is bounded by row and byte limits.
  • Session and operation retention are bounded by count and TTL.