Skip to main content

Command Palette

Search for a command to run...

Treat Your Tables Like Cattle, Not Pets

break free of old habbits to scale a warehouse to thousands of tables

Updated
5 min read
Treat Your Tables Like Cattle, Not Pets

In 2006 when I first started working with servers, the company I worked at gave our private cloud infrastructure iconic, powerful, Greek-inspired names like Athena and Archimedes.

These machines ran for years, and I was in charge of maintaining them. They rarely got rebooted and it was really hard to remember exactly what configuration had been applied. I had nightmares about one of these long-lived things having a catastrophic failure. How quickly would I be able to rebuild it from scratch?

The problem started when we had multiple people with admin access making changes. This eventually led to an incident where a public port of our telephony system (Asterisk) was accidentally exposed and hackers made thousands of dollars worth of phone calls from China to Cuba. Oops. How could this incident be prevented in the future? It seems that our pet strategy wasn't working.

Around 2012, Randy Bias started the whole pets vs. cattle thing as a way of describing the difference between how on-premise stacks vs cloud stacks should be managed. Ultimately the disposability of servers is the key difference in management styles. The analogy isn't very vegan-friendly but it does hold.

My experience working with cloud databases for the last decade or so has taught me that cloud database objects should also be treated as disposable.

Why you should throw your tables* away...

\This applies to any object that is configured in your data store: databases, tables, views, users, roles, grants, really anything.*

My philosophy here is that one-off configuration changes are fast to make but become costly over time. For example; creating a table and then applying a grant statement means that the next person that creates that table has to remember to add the same statement and any statements applied after. This is compounded in large multi-admin systems when the configuration is constantly changed. Take this series of events and configuration changes:

When you have 100s of tables to manage, and dozens of users the problem starts to emerge. Here's what happens when you treat your data objects like pets:

  • The configuration will change over time, especially with many developers, causing errors and confusion.

  • You need to provision too much access to users, causing insecurity.

  • You will end up with things you didn't intend to, creating errors.

  • It's difficult to roll back changes or audit them, causing instability.

  • It would take you hours or days to restore things, causing downtime.

All of this means after some time with a complex system you will have security, stability, confusion and generally a hard-to-operate system. You'll be back where I was in 2006!

Treating Tables like Cattle

My experience working with cloud databases for the last decade has taught me that cloud database objects should also be treated as disposable

The idea is simple. Instead of applying configuration slowly over time, and expecting database objects to be long-lived; we apply configuration at build time and rebuild everything often. This means treating them as disposable and rebuilding them often, rather than making slow, incremental changes over time. By doing this, you can ensure that your database objects are consistently configured and avoid the problems that come with multiple people making changes to them.

The first step in this journey is to abstract configuration away from business logic. Business logic is usually expressed in a SELECT statement that might look like this

case when
 status = 'shipped' and
 return_flag is not true and
 payment_status = final
then 'settled' else 'active'
end

While configuration is things like this CREATE TABLE or GRANT

One way to think about it is to break down the sublanguages of SQL and group them into logic and configuration:

It's a simplistic bifurcation, one I'll explore in a future post.

Enter dbt™

My current employer has built dbt and dbt Cloud to help developers build, test, and maintain their data infrastructure. It allows users to define their data transformations and models in code, rather than as messy SQL statements or impossible-to-version GUIs. This makes it easier to version control changes to the data infrastructure and to automate the process of building and testing data changes.

Overall, dbt simplifies the process of building and maintaining a data infrastructure, making it easier for developers to work with data and reducing the risk of errors and downtime. I may be a bit biased; but I'm not alone, the thing that feels magic to me about dbt is that it lets you focus on logic, while configuration is done largely automatically under the hood. This makes it incredibly powerful.

Here are some of the ways legacy "pet" problems are overcome with the "cattle" approach:

Pet Farm IssueThe Cattle ApproachThe dbt™ Way
The configuration will change over time, especially with many developersConfiguration is stored in a file, which is version controlled.Configuration is largely done in dbt_project.yml
You need to provision too much access to usersConfiguration is applied by an automated bot, not end-users.The user running dbt build in production is elevated, developers aren't.
You will end up with things you didn't intend toConfiguration is easier to audit if it's in one placeChanges are tested, reviewed by analytics engineers and put into production by an orchestrator.
It's difficult to roll back changes or audit themConfiguration is stored in a file, vs piecemeal over time.Every change is version controlled in git, every change is logged.
It would take you hours or days to restore thingsSimply re-apply configuration and logic.A single dbt build can build an entire warehouse, configuration and logic.

In summary, to prevent configuration issues with your database objects, consider using a tool like dbt to manage them in code and follow the "cattle" approach by treating them as disposable. This will help you maintain a stable and secure system that is easy to scale.

To get started, quantify the problems that your "pet" approach is causing; is there evidence for outages caused? additional tickets created? difficult to hire senior staff? This helps build a strong why for change.

Footnotes:

¹Not all database objects are available via dbt natively today; some folks use scripts like these run via a run-operation, whilst others would opt for another declarative tool like Terraform to manage closer-to-infrastructure objects like databases.

More from this blog

P

Poolbeg Data Stacks

2 posts

Ernesto Ongaro is a data professional with over 15 years of experience. He has held various roles in data analysis, warehousing and product marketing. Currently a Solutions Architect for dbt Labs.

Treat Your Tables Like Cattle, Not Pets