Prepare to onboard data

In this video tutorial, you will be quickly introduced to preparing DataFoundry to onboard data, by configuring:

  • User accounts and permissions
  • Data sources and domains
  • Databricks cluster templates

Steps

  1. Create and configure a user account
    • Name: Training
    • Email: (email) 
    • Password: (password)
    • Roles: Data Modeller, Database Admin
  2. Create a domain, and assign its users and artifacts
    • Name: Training_Domain
    • Description: RDBMS and file based data, onboarded for learning purposes
    • Users: (unchecked) 
    • Training: (checked)
  3. Define a RDBMS data source, for later configuration and onboarding
    • Source Name: Training_RDBMS
    • Source Type: Oracle
    • Target Schema: Training_RDBMS
    • Target Location: /iw/sources/Training_RDBMS
    • Make Publicly Available: (unchecked)
  4. Define a File data source, for later configuration and onboarding
    • Source Name: Training_CSV
    • Source Type: Structured Files (CSV)
    • Target Schema: Training_CSV
    • Target Location: /iw/sources/Training_CSV
    • Make Publicly Available: (unchecked)
  5. Assign artifacts to domain
    • Sources: (all)
    • Domains: (none)
  6. Define an ephemeral cluster template, to control later jobs
    • Duplicate Existing Template: Default_Cluster_Template
    • Cluster Template Name: Standard_NoPool_NoML
    • Description: Non pooled jobs without ML processes
    • Cluster Template Mode: Standard
    • Cluster Pool ID: n/a
    • Max Workers: 8
    • Worker Type: Standard_D8s_v3
    • Driver Type: Standard_D8s_v3
    • Terminate After: (checked) / 10
    • Enable Auto Scaling: (checked) / Default Min: 1 / Default Max: 2
    • Support for ML Workflows: (unchecked)
  7. Assign cluster template to sources and domains
    • Sources: (all)
    • Domains: Training_Domain
  8. Log into new user account
    • Review Data Catalog and Domains