Learn how to use Tree Schema to build an effective data catalog.
Overview
Tree Schema provides you the essential data catalog capabilities required to build a strong data culture. In this tutorial we’ll walk through how to build your Tree Schema data catalog from scratch so that you can get up and running in no time!
Here are the topics and order in which they will be covered. Feel free to skip ahead at any time! This tutorial provides a comprehensive overview for how to populate your Tree Schema data catalog but you can always find more details and step-by-step instructions in our help & documentation as well.
- Add your data stores
- Add schemas to your data stores
- Populate sample values for your fields
- Data lineage
- Capture your team’s knowledge
- Define your keywords in the data dictionary
- Tag your data assets
Add your data stores
The very first thing you will want to do in Tree Schema is to add your data stores, nearly all of your other data assets - tables, fields, sample values, etc. - reside within a data store. When you create a data store you have two options, which you will see shortly:
- Connect directly to your data store: this will enable Tree Schema to automatically populate your data on your behalf and is the suggested approach. By using this approach you can populate your entire Tree Schema catalog in under 5 minutes!
- Create the data store without the connection: there may be times that you cannot connect Tree Schema to your data store, perhaps the data store is owned by one of your vendors and you do not have access or maybe Tree Schema does not yet have an automated connector. Creating a data store without a connection gives you the ability to manually define your data assets within the data store.
When you first log into your Tree Schema account you will be on your organization dashboard, navigate to your Data Stores page on the left-hand navigation bar.

Select the Create Data Store button in the top right corner.

The automated connectors that Tree Schema has will be displayed. Select the one that corresponds to your data store.

As a quick aside - if there is a database or any other tool that you need but we don’t have please let us know, we are always adding new connectors!
Select the one that you are connecting to. If there is a tool you’re using but we do not yet have a connector for, we suggest using the Other type.
For this example, we will be using the Postgres connector. After you select your data store type you will need to enter a data store name and users for two roles:
- Technical point of contact: this is primarily for information purposes within Tree Schema, if your users need help connecting to the database, downloading drivers, or have general technical questions the technical point of contact will help your users get directly to the person who can help.
- Data steward: data stewards are used within Tree Schema for both informational purposes, to help your users understand who they should contact when they have questions about using your data, as well as for Tree Schema to assign governance actions. Any governance actions associated with a data asset is assigned to the corresponding data steward.

After you select next you will be given the option to connect to your data store.
Connect directly to your data store
As mentioned above, connecting Tree Schema to your data store will save you tons of time as Tree Schema will do all of the heavy lifting to populate your data catalog on your behalf. Select “Yes, set up connection” to create the automated connection.

Each data store has its own unique set of fields required for the connection. Fill in the fields for your data store and hit “Test connection” to see if Tree Schema can reach your data store! If your data sits behind a firewall or is generally not available via the public internet you can set up a jump server to route your traffic through

The final step, when creating a connected data store, is to determine which teams will have access to view the sample values within this data store. We will see the specifics around this later on, in the sample values section but the important thing to know for now is that if a user is in any of the teams granted full access to the data store then they will always be able to see the sample values within that data store. If a user is not in any of the teams that have full access to the data store, then their ability to view specific sample values within the data store can be revoked.
Here we’ve given the default team full access to the data store.

That’s it! When you hit submit you will now see your data store listed under the Data Stores page.

Create the data store without a connection
To create a data store without a connection, follow the same steps as above but when it comes to the automated selection screen select “No, create it manually”.

This will complete the data store creation process and you will see your data store listed under your data stores view.

Add schemas to your data stores
A schema represents the shape and semantics of your data. If your data store is a SQL database then your schema may be represented as a table or, if your data store is a no-sql database, your schema may be represented as JSON objects or Parquet files. The common denominator is that a schema sits within a data store and it has one or more fields.
There are three ways to create schemas within Tree Schema:
- Automated schema creation from your data store: This is the recommended approach and should be used when you have created a connected data store. With this approach Tree Schema will not only identify the schemas that exist within your data store but it will also populate the fields for each schema and sample values for each field.
- Automated schema inference from a file: You can automatically create a schema by providing a sample file, this is useful if you do not have a connected data store to represent your data.
- Manual schema definition: Just as the name implies, you can manually create or adjust your schemas.
To add a new schema you first need to navigate to the data store that the schema will reside within and then add the schema within that data store. Select the data store that you created in the steps above to navigate to the data store details.

In the details panel at the bottom, the schemas tab will be selected. There are two buttons here, Manage Existing Schemas and Add New Schemas. These two buttons are relatively self-descriptive, for now we will select Add New Schemas.
This will bring up a modal at which point you can decide to create schemas directly from the data store or to create them manually - either with a file or by creating the entire schema yourself.

Automated schema creation from your data store
To add schemas automatically, select the Automatically from Data Store button in the pop up. This will display a quick blurb that tells you that depending on the type of data store you are connecting to and the number of schemas that exist within your data store it may take up to a few minutes to retrieve all of the results. Hit next to confirm and to have Tree Schema capture your schemas.

When the results load you will have the ability to choose which schemas are saved and which schemas should have the fields added. By default Tree Schema selects all of your schemas to save and adds fields for each schema. You can exclude any specific schemas here that you would like.

When you are done hit submit and save your schemas! If you have several hundred or thousands of schemas within your data store it may take a few minutes for the results to fully load. Tree Schema will send you an email once all of your schemas have been added.

When you close the modal and refresh the page you will see your schemas in the details section at the bottom.

Automated schema inference from a file
If you have a file that represents your schema, maybe you created an extract from your database or a client sent you a sample file, you can upload that to Tree Schema to have it automatically create the schema based off of the content of the file.
Navigate to add schemas again but this time select Manually / Sample File.

You will need to enter the schema name as well as to assign the tech point of contact and the data steward. When you create schemas automatically from the data store Tree Schema applies the same steward and technical point of contact of the data store to all data assets created.

The final step in the process is to upload a file. You will see the empty schema definition view with a button at the top to upload a sample file. Select the button to infer the schema from a sample file, choose your file extension and select a file.

When you hit submit the file will be uploaded and Tree Schema will infer your schema.

You can hit submit at the bottom (not depicted in the picture) to save your schema.
Manual schema definition
The manual schema definition follows the same steps as inferring a schema from a file, the only difference is that you can manually define all, or parts,of the schema. In the schema definition you can add new fields, change data types, or create sample values for your fields.
Tree Schema uses “dot” notation for embedded objects, so if you create a schema definition such as:

Will create a set of fields that has the following structure:
{
"field": {
"sub_field": "string",
"sub_field2": "string"
}
}
Populate sample values for your fields
Sample values allow your users to understand what specific values exist for each field and what those values mean. They are a critical aspect to allowing your data users to effectively use your data.
From the data store details page select Visit Schema for one of your schemas that was created.

This will bring up the schema details page. Now, select a specific field in order to update the sample values.

The field that I’ve chosen only has one sample value populated. Edit the description by selecting the edit button on the right side.

This brings up the edit sample value modal. You can change the value of the sample value, the description and whether or not users without full data access can view this field.

Back when we created a connected data store we assigned teams to have elevated access to the data store. A user will not be able to see this sample value if all three of the following are true:
- The user is not in one of the teams that has elevated access to the data store
- The user is not an admin
- The value for Allow users without full access to this Data Store to view & edit this value? is set to “No”
You can also add additional sample values as needed from the field details page.
Data lineage
Data lineage allows your data users to understand how your data moves and is a critical capability for all data users whether they are researching potential impacts when making changes to a data pipeline or trying to understand which of the six date fields in a given table should be used for reporting.
Define your data flows
In order to have data lineage you first need to capture how your data moves. Tree Schema captures data movements through Transformations. A Transformation in Tree Schema is simply a reference to data moving from a field in one schema to a field in another schema.
To create a transformation in Tree Schema, navigate to the Transformations page.

Select Create Transformation to move forward with creating a new transformation.

Similar to data stores and schemas, the transformations have a name, type and points of contact.

The last step is to simply define where the data comes from: the source(s), and where the data is going: the target(s). From the Select source schema(s) tab, first choose the data store, then the schema and fields you want to include in your transformation. Once you have selected the fields you want to include in your transformation hit Add to transformation diagram to place the fields into the transformation.
Tip: You can add more than one source schema at a time.

Next, choose Select target schema(s) at the top to select the target schemas and fields. This follows the same process.

Now, just click the triangles from the source and then click again on the triangle at the target.
Tip: Click and release to create a connection, do not click and hold!

When you have completed creating your transformation, click I’m done, save transformation! to finalize your transformation.
Explore data lineage
When you have created transformations for your data assets you can start to explore them with data lineage. Every data store, schema, field and transformation has a tab under their corresponding details section for lineage. When you load the page for a data asset all of the immediate data lineage connections for that asset will be displayed.

From here you can explore up and downstream connections for your data lineage. A more fully-connected example is shown below:
Capture your team’s knowledge
We’ve now walked through how to capture and relate all of your data assets. In this section you’ll learn some of the ways that you can share knowledge about your data with your teammates.
Rich-text documentation
All data assets in Tree Schema - data stores, schemas, fields and transformations - have the ability to define rich-text documentation in their own corresponding README panel. Update the README start sharing your knowledge!

Comments & conversations
Your users will have questions and oftentimes more users will have the same questions later on. The comments section is a great way to allow your users to share additional information about your data. You can also attach files to comments which can be a great method for sharing common content such as queries, access instructions and more

Assign experts
Knowing who uses your data is important, those who use certain data assets are generally the ones who can help others as well. In Tree Schema all data assets have the following types of experts:
- Power Users: those who visit and use the data asset most often
- Volunteer Experts: those who volunteer as experts for the given data asset

Define your keywords in the data dictionary
You can use Tree Schema to capture the keywords that drive your business. When you define your keywords you also create a context for that keyword. A context is the scope for which the keyword has a meaning. Consider the example keyword “channel”.
For the marketing team a channel could be how the user came to your app. For the product team, it could be a segment of correlated user groups, and for the development team the channel could mean the type of pipeline that is processing the data.
To create a keyword, navigate to the dictionary and select Add New Keyword.

The keyword creation modal will be displayed, complete and save your keyword.

Every time that you add a new keyword, the context is registered as a tag. In the example above we just created the context “Marketing”, now, the rest of our data assets in Tree Schema will have the tag “Marketing” available. While this is not the only way to create tags, using the context from your keywords to pre-define your tags can be a great way to limit the scope and structure for your tags.
Tag your data assets
Tags can be used to group your data assets together. Common uses for tags include identifying data assets for a use-case (e.g. marketing), finding all PII data or for including a structured set of training.
Every data asset in Tree Schema has the ability to add tags. To add a tag to an asset just type in the tag you would like to add and either select an existing, similar, tag or create a new one. In this example, I’ll navigate to a data asset and just type the letter “M”. As you can see the “Marketing” tag is available because we defined the work “channel” with the context “Marketing” above.

You can add as many tags as you would like to each data asset.

View assets by tag
Once your assets are tagged you can navigate to the tags page to view all assets by each of your tags.

Selecting a row will beak out each different type of asset that is associated with that tag:

Thats it!
You've now gone through the core Tree Schema features. Make sure to check out the help & documentation for full details and more examples!