There is a lot of confusion of how to run regular maintenance on logs, caches, and indexes in Magento. Most of it stems from applying maxims from the past to today. In the past, server storage space was at a premium – and offsite storage required tape backups to handle the quantity of data. Which also meant difficulty to access such data. The modern maxim is “disk space is cheap” – and it is!
The other main issue is that while the modern maxim is “disk space is cheap” – that does not mean you want to never delete active files/data. The more files there are in a single directory, the slower the system is when trying to access them – even when it is only trying to access a single file! Extremely large files, over 1GB, will become problematic in a number of ways. They are difficult to analyze. They are difficult to spot check for problems. Under some methods of access, such as memory mapped files, they can become a memory issue. While you should keep your log files, don’t make your server have to deal with them while also processing client requests.
There are very valid business reasons for not retaining logs such as privacy laws in Europe. There are also very valid business reasons for retaining all information for a long period of time, such as the Sarbanes-Oxley act of 2002. So you must consider these business needs carefully when deciding what to keep and for how long. From a personal perspective, it seems to me that every time old log data has been forever purged — within 2 weeks there is suddenly a need for it and I have to explain why it was deleted. The most stress-inducing incidents are when you find out the information is needed about an hour after it was deleted.
Percona distributes a free set of tools for working with MySQL and MySQL like databases. https://www.percona.com/software/mysql-tools/percona-toolkit Included in this toolset is pt-archiver which is designed to migrate small chunks of data at a time from active tables to archive tables. This allows you to keep your active database logs under 32,000 records or so – while not performing expensive large move operations. Depending on your need, you can archive 100 records an hour to 1000 records an hour, whatever meets your need. Not only that, you can decide to set a lower limit under which you don’t archive anything – so you always have access to the last 10,000 records.
As your archive tables grow, they too will require periodic maintenance. However, I really don’t like to have automated maintenance scripts for them. We want to know when and where the data was sent – and we want to be able to adjust the schedule based on current server activity. As such, you can start with a schedule of manual archiving every 3 months and adjust it as needed. When removing the archive data, export it to a file and store it securely on a cloud service. Securely means making sure to scrub any sensitive data and encrypting it. There really is no excuse to not do both, you don’t need live access to the data and can download and decrypt it easily if you need to.
Caches & Indexes:
Unlike logfiles, caches and indexes tend to grow too large overtime. A simple nightly cleaning task – using Magerun, makes for easy maintenance.
Here we can use a tried and well tested Linux utility, logrotate: https://www.linuxcommand.org/man_pages/logrotate8.html. Logrotate gives you the ability to easily move the current logfile into an easily accessible backup – and backups are moved into compressed backups. Logrotate will take the current logfile, for example system.log, and move it to a new file called system.1 so it is easily accessible in the same direcory as the current log. Before moving the logfile to system.1 it will move system.1 to system.2 and onwards right up the chain. In addition, you can configure logrotate to compress the archived logs after a specific number, so if you want a weeks worth of data readily accessible, you can have log numbers 8+ compressed. So log system.7 will be rotated to system.8.gz
Most Magento systems are already configured to rotate system.log and exception.log on a nightly basis. However, it is recommended to check your other logfiles and in general, rotate everything on a nightly basis – whether it is very small or large. This makes it extremely simple in the future to compare log data for the same time period. If you notice something odd in system.32.gz and you want to look in extensionName.log you know precisely what file to look in, extensionName.32.gz
Again, it is not a good practice to discard log data – so you do not want log rotate deleting data. Is highly recommended to configure it to only delete files after a year. A 3 month maintenance process to remove older logs allows to keep the files down. This also must be adjusted by the number of logfiles that are processed – we still do not want to have an excessive number of files in a single directory! So if there are 100 different logfiles, that means that there will be 700 files generated a week. Over the course of 6 weeks, that becomes 4200. There is no technical justification for it, just long experience that despite the many claims of different file systems to be able to handle millions of files, none of them meet that promise. Personally, I’ve never run into problems when the number of files is below 1000 so that is the number I try not to go over.
A careful review of your business needs and unique system configuration is needed to determine what archiving policy meets your needs. Make sure to balance the extremes of keeping everything with the active data on the server vs aggressively deleting everything in pursuit of mythical performance gains.