Databricks Reverse ETL Setup
Set up Databricks as your Reverse ETL source.
At a high level, when you set up Databricks for Reverse ETL, the configured user needs read permissions for any resources (databases, schemas, tables) the query needs to access. Segment keeps track of changes to your query results with a managed schema (__SEGMENT_REVERSE_ETL
), which requires the configured user to allow write permissions for that schema.
Required permissions
-
Make sure the user or the service principal you use to connect to Segment has permissions to use that warehouse. In the Databricks console go to SQL warehouses and select the warehouse you’re using. Navigate to Overview > Permissions and make sure the user or the service principal you use to connect to Segment has can use permissions.
-
To grant access to read data from the tables used in the model query, run:
GRANT USAGE ON SCHEMA <schema_name> TO `<user or service principal you are using to connect to Segment>`; GRANT SELECT, READ_METADATA ON SCHEMA <schema_name> TO `<user or service principal you are using to connect to Segment>`;
-
To grant Segment access to create a schema to keep track of the running syncs, run:
GRANT CREATE on catalog <name of the catalog, usually hive_metastore or main if using unity-catalog> TO `<user or service principal you are using to connect to Segment>`;
-
If you want to create the schema yourself instead and then give Segment access to it, run:
CREATE SCHEMA IF NOT EXISTS __segment_reverse_etl; GRANT ALL PRIVILEGES ON SCHEMA __segment_reverse_etl TO `<user or service principal you are using to connect to Segment>`;
Set up guide
To set up Databricks as your Reverse ETL source:
- Log in to your Databricks account.
- Navigate to Workspaces and select the workspace you want to use.
- Select SQL in the main navigation.
- Select SQL Warehouses and select the warehouse you want to use. Note that Segment doesn’t support the
Compute
connection parameter. - Go to the Connection details tab.
- In a new tab on your browser, go to the Segment app.
- Navigate to Connections > Sources > Reverse ETL.
- Click + Add Reverse ETL source.
- Select Databricks and click Add Source.
- Enter the configuration settings for your Databricks source.
- Copy the Hostname, Http Path, and Port from the Databricks console from step 5.
- The Http Path in the above step should contain
/sql/1.0/warehouses
. Note that Segment doesn’t support/sql/protocolv1
. - To generate a Token, follow the steps listed in the Databricks docs. Segment recommends you create a token with no expiration date by leaving the lifetime field empty when creating it. If you already have a token with an expiration date, be sure to keep track of the date and renew it on time.
- Click Test Connection to see if the connection works. If the connection fails, make sure you have the right permissions and credentials, then try again.
- Click Create Source if the test connection is successful.
Once you’ve added your Databricks source, add a model and follow the rest of the steps in the Reverse ETL setup guide.
This page was last modified: 14 Sep 2023
Need support?
Questions? Problems? Need more info? Contact Segment Support for assistance!