“Modernizing” databases

We work with modern databases, and also help our clients modernize their database setup. This involves many possible services.

Database version upgrade

We have done many projects where we just migrate a critical application with a few TB of data from Postgres 9 to Postgres 15, or some similar jump in major versions.

This sort of project does not involve any deliberate change to the schema. We use the word “deliberate” here because some small schema changes are forced upon the team because of the version differences. Data types of certain columns in certain tables may change, either because the earlier datatype is deprecated or because there are major benefits by migrating some columns to a new datatype or a new index type.

Our team will perform the data and stored procedure migration, test that existing applications work correctly with the new database version, operate a parallel run for a specified number of days or weeks, and then do the final cutover and migration.

Database migration

These are more ambitious projects where we migrate an application from one product brand to another. Relational databases are not as different from each other as their marketing brochures claim. We have migrated applications from MS SQL Server to Postgres, Oracle to Postgres, etc.

The aim of such migrations is always to make minimal changes in the application source code. In some cases, the only change needed is to rebuild the application code with a new set of DB drivers for the new database. In most cases, there are small changes needed for superficial changes to column datatypes.

Behind the surface of the DBMS, a lot of complexity needs to be handled:

  • The datatypes need to be mapped, and if strange and non-standard datatypes are being used in the source database, it is unlikely that they will be supported by the target, in which cases suitable alternatives are discussed and decided.
  • Data is migrated, often by writing scripts to read record by record and inserting into the target.
  • Stored procedures and triggers are migrated, and extensively tested
  • Applications are rebuilt to connect to the new database and another round of tests is done
  • If there were any ETL processes which were loading data in batch mode into the original database, then those need to be tested, perhaps modified, to load into the target database.
  • Transaction boundaries and transaction semantics are scrutinised and tested to ensure that if the application was depending on some unusual semantics in the original database, they must continue to generate correct results with the target database. The same comparison is repeated for record locking.
  • Database clusters are set up to deliver similar functionality for high availability, data replication, read-only replicas, or DR mirroring with the new database as was present earlier.
  • Finally, at the time of cutover, the size of the dataset may require careful resource planning to ensure that downtime is limited to a minimum. Sheer data replication size may push this duration beyond 12 hours in some cases.

We have executed projects with 40TB of data and several 10,000s of lines of stored procedure code. The systems are live today and have proven to be 100% as stable as the original systems.

Database architecture

Sometimes, the architecture of a database system or cluster needs to be extended to deliver additional functionality. This may include

  • Setting up of a high availability cluster, starting with a single DB server
  • Setting up of one or more read-only replicas of a transaction database, so that the application may direct its read-only accesses to the replicas and reduce the master database load
  • Setting up of database mirrors at DR sites
  • Setting up automated backups to be generated and shipped out to cloud object stores like AWS S3 Glacier Deep Archive

In these cases, the application code may be modified by our team too, or Remiges may work in collaboration with another team, where our team only sets up the databases.

New databases

More extensive surgery may be required in some systems, where simply scaling up the core relational database may not be adequate. This may involve

  • Setting up a non-SQL database system to take some of the data access load. These systems may include ElasticSearch or MongoDB
  • Migrating an application from a classical relational database with hard ACID properties to a distributed database with multi-master architecture and eventual consistency. This includes CouchDB or MongoDB. We feel that classical relational databases are used too blindly and too widely, where many application areas are better served by a distributed database with eventual-consistency semantics.

In such projects, applications may need some redesign and significant code rewrite to execute the successful migration. Often, only some tables of a database are migrated, leaving the system operating in a hybrid database environment where classical hard-consistency RDBMS continue to be used for synchronous transaction processing.

Data outside databases

Large enterprise applications sometimes need to keep large volumes of data as documents, e.g. PDF, JPEG images, etc. These are often scanned images of authoritative originals like photographs, scanned passports, scanned certificates, etc. We have seen such data stores exceeding 200TB, bringing about problems of secure backup, scalability of the underlying file systems, and overall manageability.

We have executed projects where such data stores have been migrated to object stores, conceptually similar to AWS S3, either in an on-premise data centre or on the cloud. An object store is neither as structured and rigid as a proper table-structured database, nor as ad hoc and difficult to manage as a file system with each document being stored as a file. Open source object store products like MinIO and Ceph provide excellent management features which are often more robust and scalable than conventional file systems.

Postgres and EnterpriseDB

We work with all popular relational databases, including MySQL, MS SQL Server, and Oracle. But we have a strong affinity for Postgres, and strongly recommend this wherever database choice has not been preordained. Remiges have executed numerous projects on Postgres, and managed and operated them under extremely stressful loads.

For large enterprise deployments, where risk management is a critical issue, EnterpriseDB offers the peace of mind and world-class support which make Postgres even more attractive. Remiges are EnterpriseDB partners.

In summary

Databases are living animals, and applications which need to grow and adapt to changing business contexts cannot afford to keep their original data architecture unchanged. We can help.

And in this respect, we are not your run-of-the-mill coding company. The software services industry has thousands of small and medium companies which seem to operate merely as collections of coders. But our skills and this track record allow us to assert that we build systems, we execute projects. The story never stops at the code.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *