14
votes

Firstly, I apologise for the sheer size of this question as I'm sure what I am proposing is a "big deal" in terms of implementing and probably could be three or four separate questions in itself. I wouldn't ask if I weren't in desperate need of help.

I have been given the monumental task of revising my company's risk management procedures in regards to our online work.

As we take no backups, nor protect our data I have decided that, like anyone involved in professional programming should already be doing, we were going to protect our work through source control. I currently do this on a local basis with Git, but others use no source control and ultimately we lose a lot of the benefits that source control offers. I'd rather us have a system where everyone uses Git and have it enforce the rule that if it's not in source control, it doesn't stay. Obviously, we're going to need a backup plan, but as a developer I suppose the first thing to do is to sort out the coding aspect of things before getting a backup solution sorted - obviously, any advice on that too is more than welcome.

We run a ASP.NET website with a SQL Server 2005 backend, running Sitecore as our CMS of choice. In an ideal world I would like to have all the changing parts of this CMS site under source control, including the database.

At the moment, and I know this isn't the greatest idea, I run one solution for ALL sublayouts built in Sitecore. This is under source control and thanks to Git I've been able to add branches and push new features and fix bugs easily (using Git-flow as my workflow solution). I'm still quite new to Git though, so I've not managed anything too complex outside of committing, ignoring certain files, etc.

On top of this, I would also like to use source control to get the database contents under source control. As I understand it, you can serialise Sitecore content items as a huge tree within the file system (saved as .item files if I remember correctly?). If this is the ideal solution I would also like to add these to source control, although I don't know exactly where they would be saved on the file system. My file system right now is like this:

- Data      (Logs, indexes, etc - is this needed to be in source control?)
- Source    (Helper files, although occasionally modified)
- Website   (Containing all the files I edit, and other essential Sitecore stuff)

As mentioned already my current repository is only on my system, and it consists of a single solution folder with a bunch of .ascx, .ascx.cs, .ascx.cs.designer and the odd .aspx file or two. This tends to make my life easier when uploading as, like with the

What I would like input on is an ideal way of managing this for all developers. Despite using a DVCS I would prefer to have the live server viewed as the main repository and for all the other developers to push and pull from it, and each other. We'll be using the git-flow workflow solution as it conforms to our way of development nicely. What I'm worried about, obviously, is setting this up correctly without destroying what's currently a very expensive, high-traffic site on a server with no backup.

Tips and advice on how much of the data on the server to stick in the repository, guidance on how to handle the serialised data in Sitecore and potentially how to use the source control itself as a way of backing up to a separate repository would be welcomed. This is the first time I've had to build a source control system/workflow for a live website, so any guidance and advice on what would be the best thing for me to do would be much appreciated.


EDIT: I am going to put a bounty on this to try and get more guides on how people handle Sitecore with Git.

To clarify myself, I am NOT looking for a way to back up my work, rather a way so that a number of developers can work on it and ensure that the code on the website is up to date with a central repository. For example, I have referenced before that I will be using git-flow to manage my workflow. The origin repo will exist on a shared server (which in time will likely be a test environment), and all developers will have clones of that to work on and to push to. From here, I want to be able to push changes from the origin repo on the shared drive to the live server and back again if errors are found. I would also like to include serialised content items in my repo.

4

4 Answers

11
votes

Check out HedgeHog's Team Development Soultions (3.0 is latest version).It meets many of you needs when used with Visual Studio, Sitecore Rocks, Team City (or other build server). Visit http://hhogdev.com/ for more details.

4
votes

Revised answer after revised question:

Ok, let me expand on my original idea now when we have more background information from you. Since you say that you only have once license for Sitecore and can't have a separate test server etc, we can always modify this slightly and still achieve the same effect.

What if you had several repositories on the same server running the live Sitecore? If it's possible for you to setup Sitecore to use different roots/repositories on the same file system, e.g. you change the url to http://yoursite.com/blahblah/test to run Sitecore in test mode. This depends on what kind of license you have of course, i.e. if it's tied to a specific machine. Anyway, this way you could test your site on another branch (e.g. a develop branch in a test repository) before you merge the stuff into master and let it go live.

So you could have a bare repository on the server, where everyone push and pull from. And you could have two additional non bare repositories on the same server, one with the master branch checked out and the other with the develop branch checked out. By logging in via ssh you can easily run "git pull" in the test repository if you want to test new functionality on the test version of your Sitecore site. When you are happy with the changes, merge into master and push to master on your bare repo and update the live repo in the same way.

I think you need to try and find a way to have two versions of your site, so you can test the changes before they go live.


Original post:

I strongly suggest that you have a separation between the live server and what you are currently working on, i.e. another repository where you push your work too (and pull from) which works as an integration repository. This way you can integrate code and test it locally (local to your organization) before you push it to the live server, so no one accidentally pushes code/databases/whatever directly to the live server.

I'd also recommend that you take backups of the data for the central repository, in other words, git should be used as a version control system, not as a backup system. Even git might fail and cause corrupted repositories, and then you are smoked if you don't have any backups anyway.

Also, if its possible, try to separate actual site content from the logic working on the data, i.e. try to keep a good model/view concept. This way you can easily setup a test environment with test databases that is independent of the code, and there is no need to commit databases. Unless you really want to commit them of course :)

2
votes

When I step back and try to see what you are doing, it is managing the risk of losing business continuity. e.g. If your site goes down, you would ideally want to be able to use your backup to completely and automatically restore the site.

Data storage is not expensive. Really, it isn't. Even when you have a huge data set. And therefore, requiring all developers to use git is a good idea. Many organizations set up a single git server, and you just push/pull to that server. If your solution has good and complete tests, you'll know very quickly if code merges have broken the software. If not, you should probably use a central git server for developers to push/pull, and then a separate release integration server which uses "git fetch" to merge & test changes from the central server.

You have outlined a variety of components of your solution, such as backing up code, data, database, CMS entries, et cetera. However, the overall question you should ask is whether you have gathered enough stuff to be able to completely activate your site just from the backup. If you can't do that, you haven't done enough. If you can do that, you have done enough.

On the licensing issue, you need a better license. Ask the owner of Sitecore for a license for a test server, and tell them it is for a test server. A good vendor will realize that if they help you in this way, you are more likely to renew your subscription. Or, ask your finance people for another Sitecore license. If your Sitecore-based CMS site is such a great utility for your company, another license won't change that benefit by much.

0
votes

I spent an hour learning what I could about Sitecore (never heard of it before today, cool tool), and I think I understand what you're trying to do. Here's how I would do it:

  1. Setup a blessed bare repo on a Linux box (physical or virtual, shouldn't matter).
  2. Initialize a git repo on the production environment, at the base folder where all source code, configuration files and data can be added to the repo. (Remember to set a user.name and user.email that will identify the sysadmin taking care of production updates.)
  3. Create a SQL Server dump file for each schema, and put all the dumps in a specially marked folder inside the local production repo.
  4. Stage all files (git add --all) and commit with an "Initial commit, version X.Y.Z" comment where the version number is whatever you consider the current deployment to be.
  5. Push to the blessed repo.
  6. Clone the blessed repo on all dev environments. (Again remember to set user.name and user.email for proper identification in commits.)
  7. Start using git-flow.

Note step 3 where SQL Server dumps are created and added to the repo. This will allow you to rollback any future changes to the database(s). Keep in mind you will need to generate new dumps every time you make changes, but only for schemas that actually changed. Serialize what you need to serialize, and commit the new serializations along with the schema changes. That way, git revert {commit_hash} is all you need to revert those specific changes (without forgetting to restore the database, of course).

I hope this helps, directly or indirectly.

P.S: I don't know your database structure; if configurations are stored in the same schema as the user data, it definitely complicates things since you can't just restore a previous dump without losing new (and presumably important) user data. I'm used to Ruby on Rails where schema revisions are coded for up- and downgrades alike; I hope your framework provides a similar feature.