Google Spanner: When Do You Need to Move to It?

Written by alexanderchetverin | Published 2023/09/11
Tech Story Tags: software-development | database | sql-database | database-management | google-spanner | what-is-google-spanner | php-development | google-spanner-pros-and-cons

TLDRKnow about what Google Spanner is, its advantages & disadvantages, in what scenarios it’s better to use vs. other cloud solutions, and how to migrate to it.via the TL;DR App

As a seasoned PHP developer, I utilise Google Cloud Spanner when I need a scalable database solution for web applications. I typically reach for Spanner when global distribution, strong consistency, and high availability are imperative.

Throughout my career in PHP development, I’ve worked with several database services. Among other things, I’ve grown to be quite familiar with Google Spanner and its robust features.

I appreciate Spanner’s blend of global scalability and classic SQL traits. The practicality of bridging scalability and structure makes Google Spanner perfect for projects that require a distributed database.

In this article, I’ll describe what Google Spanner is, its advantages & disadvantages, in what scenarios it’s better to use vs. other cloud solutions, and how to migrate to it.

What Is Google Spanner?

Google Cloud Spanner is a globally distributed, horizontally scalable relational database service offered by Google Cloud Platform (GCP). Spanner is designed to provide the best of both traditional relational databases and NoSQL databases.

Additional advantages to using Spanner are global scalability and strong consistency.

Some key features of Google Cloud Spanner include:

  • Horizontal Scalability: Spanner automatically scales your database horizontally across multiple regions and zones. This enables high availability and low-latency access to your data.

  • Global Distribution: By replicating your data across multiple regions, Spanner lets you serve users from different parts of the world with low-latency data access.

  • Strong Consistency: Unlike many other NoSQL databases, Spanner offers strong consistency guarantees. Transactions are ACID-compliant, and data is always up-to-date across all replicas.

  • SQL Compatibility: Spanner supports a subset of SQL. This makes it easy for developers familiar with relational databases to work with it.

  • Automatic Sharding: Spanner automatically breaks down data into smaller units called "splits" and distributes them across nodes to ensure efficient data management and query execution.

  • Automatic Backups and Replication: Spanner automatically backs up your data and handles replication across zones and regions. This enhances data durability and availability.

  • Synchronous Replication: Changes are replicated to multiple regions before a write is acknowledged. This gives you strong consistency and minimal data loss.

  • High Availability: Spanner is designed with built-in redundancy and fault tolerance, minimizing downtime and providing high availability.

  • Multi-Version Concurrency Control (MVCC): MVCC enables concurrent transaction management. This means transactions are executed without blocking each other.

Google Spanner Pros and Cons

Based on my long-term experience using Google Spanner, here are some pros and cons I’ve discovered:

Pros:

  • Spanner combines horizontal scalability seamlessly with SQL fundamentals. This includes robust transactional capabilities and data integrity assurance.

  • It’s highly suitable for multi-regional or global distributed projects, including websites or applications. Spanner is a great fit for projects involving worldwide transactions and those leveraging Google's infrastructure.

  • Spanner provides higher control over data storage, writing, and retrieval strategies. But this can be a drawback if your usage requirements change as it might require adjustments to the storage architecture.

Cons:

  • If your project doesn't require distribution and scalability, alternative solutions, e.g., MySQL, will offer better performance benefits compared to Google Spanner.

  • Spanner provides a limited range of data types, lacks stored procedures, аnd supports only serializable isolation levels. Plus, it does not support advanced SQL statements like UPDATE with JOIN. I feel Spanner is less intuitive with standard SQL techniques for implementing business logic.

  • Google Spanner is still quite raw. For example, a JSON column will not work correctly if it contains float values. Additionally, the gRPC server implementation for PHP is not well-optimized. I’ve seen this impact performance with tables featuring a significant number of columns. Developers should be ready for these and other unexpected difficulties.

When Should You Move to Google Spanner?

Google Spanner outperforms substantially when dealing with high-traffic endeavours backed by a robust core on the Google Cloud Platform. I see it as a bridge between a sophisticated database logic that craves more than NoSQL but isn't entangled in unnecessary complexity. When I’m working on a project that spans multiple regions, Google Spanner works well.

From what I’ve observed, Google Spanner is ideal for the following types of projects:

  1. High-traffic projects with substantial clustered data, where the core business logic is already hosted on or intended for the Google Cloud Platform.

  2. Projects where the database's business logic is advanced enough to go beyond NoSQL capabilities, yet not overly intricate.

  3. When global distribution is a necessity for the project's requirements.

Things to Consider Before Migrating to Google Spanner

While I won’t go into all the details of migration, you should assess the complexity of your migration before anything else. This means gathering data about your source database.

Consider query patterns, the amount of application logic that's dependent on database features (e.g., stored procedures and triggers), hardware requirements, and total cost of ownership (TCO).

Here’s the complete Migration Process:

  1. Collect all queries to the current database for each endpoint.

  2. Analyze whether the data is clusterizable, i.e., whether the data could be broken down into small predefined groups. An example of an ideal case is when users only have access to their own data or their company's data. If partitioning is not possible and you need to select collections of rows using filters, Google Spanner is not a good fit.

  3. Google Spanner clusters data based on the initial bytes of the primary key. If the existing primary keys do not result in the desired data clustering, consider adding a technical column at the beginning of the primary key to achieve the desired data grouping during storage.

    Additionally, if auto-increment was previously used for generating IDs, you will need to use hash functions to transform them into UUIDs.

  4. If, within a single endpoint, you need to retrieve data from multiple tables connected by one-to-one or one-to-many relationships, consider using interleaved tables. Utilize LEFT JOIN queries for one-to-one relationships and ARRAY (SELECT AS STRUCT ...) constructions when dealing with one-to-many relationships.

  5. If you're using PHP and your tables contain many columns, consider storing data from columns that are not involved in WHERE clauses by converting it to one or several JSON strings. The column type should be STRING.

  6. Migrate the schema.

  7. Migrate the application. To take advantage of interleaved tables, you will likely need to add an additional abstraction layer to query the framework you are using.

  8. Test and tune your performance. Please note that tests on small volumes may not be representative.

  9. Migrate the data.

  10. Validate the migration.

  11. Configure cutover and failover mechanisms.

When planning a migration, you’ll need to account for size instances and app preparation. Make sure you have sufficiently provisioned the instance for your workload by performing a load test on the instance with some synthetic representative workload. Next comes bulk migration and validation.

You may want to explore the option of using Google Dataflow for data transfer, keeping in mind that Dataflow requires Python or Java, as it does not support PHP. If you are migrating from a non-Google Cloud source, refer to your current database’s documentation regarding how best to perform bulk migration.

And finally, you’ll then need to warm up Spanner if you can’t activate your application to users in stages.

Google Cloud provides an in-depth overview of Cloud Spanner migration here. Other migration tips can be found here.

Final Thoughts About Google Spanner

In my experience with Google Spanner, the fusion of global scalability and traditional SQL principles is quite useful. Its knack for juggling multi-region configurations stands out, and it helps make potential complexity more routine.

Spanner is more than just a database; it's a strategic tool for projects aiming to wield scalability without sacrificing data integrity. Google Spanner leaves me convinced that it's not just about storing data; it's about skillfully orchestrating performance and reliability.


Written by alexanderchetverin | Senior PHP Developer
Published by HackerNoon on 2023/09/11