Using Git for Magento Website Maintenance

Maintaining active Magento websites using Git There are many articles on using Git for Magento – however, these articles focus on using Git as a development tool. While this is an extremely good programming practice, it does not address website maintenance. At best, it can be used as one part of a maintenance process. At […]

By Gary Mort

github-banner

Maintaining active Magento websites using Git

There are many articles on using Git for Magento – however, these articles focus on using Git as a development tool. While this is an extremely good programming practice, it does not address website maintenance. At best, it can be used as one part of a maintenance process. At worst, when your website crashes you will discover that your Git repositories are missing a large number of critical items in order to restore or migrate your website.

This will be one in a series of articles on maintaining your Magento website using Git. In this article, I will focus on explaining the configuration process used to restore your website files. In addition, this article will provide basic security investigation protocols to follow if your website is hacked.

Executive Summary:

Why use Git?

Because of Git’s reputation as a developer tool, many website owners do not see a reason for using Git on their production websites. Git is a distributed version control system. It advertises itself as “the stupid content tracker”(man git). That means Git does not care about file contents. It maintains a copy of every file in the directory that it is installed. And whenever a file is changed, a new copy of that file will also be saved. These files are all stored locally on the same system as your website – providing incredibly fast performance. The distributed nature of Git means it is easy to create remote copies of the Git database. And because it is a developer tool, it provides fast and extensive tools for comparing multiple copies of a Git database to determine differences between the databases.

In short, this means that Git is also an incremental backup system. Instead of asking yourself if you want version control, ask yourself if you need a backup system. The answer to that question is almost always yes. And if you are looking for a backup system, do you want an incremental backup system?  One that creates backups routinely so that your website is always saved in its most up to date form? Again, the answer is almost always yes. Even if the answer is no, it is possible to use Git as a full backup system despite the incremental nature. In a future article I will explain how to set this up.

Best Practices are bad.

A common mistake made when implementing Git for Magento or any other website maintenance is to follow “best practices” in using Git. While it seems self-explanatory to use “best practices” at all times, this standard rule base does not readily apply while performing website maintenance.  To make matters more confusing, many times the Git core team is the one recommending these “best practices”. It seems as though they would be able to provide the best advice, but unfortunately — their priorities will be different from yours.    

The Git core team is concerned with version control for programming languages which are compiled.  Even the documentation for the project is compiled! This compilation is generally done in the same directory where the source code is stored. Compilation will generate a number of interim files stored side by side with the source code. While the final compiled code will be stored separately in the build directory. This means when using Git, they want to ignore all those compiled files – and that is how Git will work by default. In cases where it doesn’t do this by default, they suggest using ignore files in order to avoid adding those files to the Git database.  

As we are maintaining files, which can all be considered source code – these standard default practices are not appropriate.

Implementation Summary

In order to implement Git’s restore website configuration, we will use a number of lesser known Git functions.  As such, you can find the implementation details in our web maintenance Github repository, along with a number of implementation tools – including setup and configuration tools which provide step by step instructions and will semi-automate the process.

While you are working, make sure that you do not ignore any files. Ignoring files can almost certainly result in missing an important file, and your security could be compromised as a result. Instead, use sparse checkouts to avoid reviewing files you don’t want to restore.

Security:

  1. Configure Git to encrypt sensitive files when checked in, such as local.xml.  
  2. When performing a security audit, check the log files for suspect files
  3. Prevent developers from cloning the repository to Github by creating a fake initial commit of a 150MB file.  Because Github rejects large files – by making it the very first commit, Github will reject that commit and abort the connection.  
  4. Linking your existing development sites:  When configuring Git for existing development sites, do not perform a checkout of your production repository.  Instead you can fake a checkout from the production repository and then add only the files which have changed in development.

Technical details, as well as some additional tips and hints can be found in our Github repository.  The initial configuration is not perfect, future articles will expand on this.  However, later articles will focus on adding new features, not changing the initial configuration.  So it is safe to start with the initial configuration and expand it.

Systems Manager