To prepare your TFS Project Collection for migration, you may want to remove (stale) old data to reduce the database size first.
Delete old workspaces
Deleting workspaces and shelvesets can reduce your migration and upgrade times considerably. either use the
tf commandline or leverage a tool like the TFS SideKicks to identify and delete these.
Not just build results, but often overlooked the actual build records can take up a considerable amount of data. Use
tfsbuild destroy (XAML) to permanently delete the build records. In the past, I've encountered clients who had 1.8 million "hidden" builds in their database and removing them shaved off quite a considerable amount of data. These records were kept around for the warehouse.
Old team projects
Of course, destroying old team projects can give back a lot of data. Anything you don't need to send to azure helps. You could also consider splitting the collection and to leave behind the old projects. That will give you the option to detach that collection and store it somewhere, should you ever need that data again.
Deleted branches are a very common hidden size hog. When deleting things in TFVC, they are not actually deleted, they're just hidden. Finding deleted files and especially old development or feature branches can give you back a lot of data. Use
tf destroy to get rid of them.
You may also want to look for checked in nuget package folders, those can quickly rack up a lot of space as well.
Ohh yes, especially when you use test attachments, these can grow like crazy, depending on your TFS version either use the built-in test attachment cleanup features or use the Test Attachment Cleaner from the TFS power tools.
The build definitions themselves won't take a lot of database space, but the build results may. But those have been covered in a previous section.
In the past, I've had to patch tfbuid.exe to handle (very) large amounts of build records, as it tends to try and fetch all build data locally before proceeding with the delete action. You may need to rely on the TFS Client Object Model to achieve a similar result.
You may have data in your git repositories that are no longer accessible due to force pushes or deleted branches. It's also possible that certain data in Git could be packed more efficiently. To clean your repositories you have to clone them locally, clean them up, delete the remote repo from TFS and push the cleaned copy to a new repository (you can use the same name as the old one). Doing this will break references with existing build definitions and you will have to fix these up. While you're at it, you could also run the BFG repo Cleaner and convert the repositories to enable Git-LFS support to handle large binary files in your repositories more elegantly.
git clone --mirror https://tfs/project/repo # optionally run BFG repo cleaner at this point git reflog expire --expire=now --all git gc --prune=now --aggressive git repack -adf # Delete and recreate the remote repository with the same name git push origin --all git push origin --tags
Work item (attachments)
Work items can gather up a considerable amount of data, especially when people start attaching large attachments to them. You can use
witadmin destroywi to delete work items with unreasonably large attachments. To retain the work item, but delete its attachments you can delete the attachments from the current work item and then clone it. After cloning, destroy the old work item to allow the attachments to be cleaned up.
Old work items that you no longer need (say the sprint items from 6 years ago) can also be deleted. My colleague René has a nice tool that allows you to bulk-destroy by first creating the appropriate work item query.
Be sure to run the cleanup jobs
TFS often doesn't directly prune data from the database, in many cases, it just marks stuff as deleted for latest processing. To force the cleanup to happen immediately, run the following stored procedures on your Project Collection database:
EXEC prc_CleanupDeletedFileContent 1 # You may have to run the following command multiple times, the last # parameter is the batch size, if there are more items to prune than the # passed in number, you will have to run it multiple times EXEC prc_DeleteUnusedFiles 1, 0, 100000
Other useful queries
To identify how much data is stored in each section, there are a few useful queries you can run. The actual query depends on your TFS version, but since you're preparing for migration I suspect you're on TFS 2017 or 2018 at the moment.
Find the largest tables:
SELECT TOP 10 o.name, SUM(reserved_page_count) * 8.0 / 1024 SizeInMB, SUM( CASE WHEN p.index_id <= 1 THEN p.row_count ELSE 0 END) Row_Count FROM sys.dm_db_partition_stats p JOIN sys.objects o ON p.object_id = o.object_id GROUP BY o.name ORDER BY SUM(reserved_page_count) DESC
Find the largest content contributors:
SELECT Owner = CASE WHEN OwnerId = 0 THEN 'Generic' WHEN OwnerId = 1 THEN 'VersionControl' WHEN OwnerId = 2 THEN 'WorkItemTracking' WHEN OwnerId = 3 THEN 'TeamBuild' WHEN OwnerId = 4 THEN 'TeamTest' WHEN OwnerId = 5 THEN 'Servicing' WHEN OwnerId = 6 THEN 'UnitTest' WHEN OwnerId = 7 THEN 'WebAccess' WHEN OwnerId = 8 THEN 'ProcessTemplate' WHEN OwnerId = 9 THEN 'StrongBox' WHEN OwnerId = 10 THEN 'FileContainer' WHEN OwnerId = 11 THEN 'CodeSense' WHEN OwnerId = 12 THEN 'Profile' WHEN OwnerId = 13 THEN 'Aad' WHEN OwnerId = 14 THEN 'Gallery' WHEN OwnerId = 15 THEN 'BlobStore' WHEN OwnerId = 255 THEN 'PendingDeletion' END, SUM(CompressedLength) / 1024.0 / 1024.0 AS BlobSizeInMB FROM tbl_FileReference AS r JOIN tbl_FileMetadata AS m ON r.ResourceId = m.ResourceId AND r.PartitionId = m.PartitionId WHERE r.PartitionId = 1 GROUP BY OwnerId ORDER BY 2 DESC
If file containers are the issue:
SELECT CASE WHEN Container = 'vstfs:///Buil' THEN 'Build' WHEN Container = 'vstfs:///Git/' THEN 'Git' WHEN Container = 'vstfs:///Dist' THEN 'DistributedTask' ELSE Container END AS FileContainerOwner, SUM(fm.CompressedLength) / 1024.0 / 1024.0 AS TotalSizeInMB FROM (SELECT DISTINCT LEFT(c.ArtifactUri, 13) AS Container, fr.ResourceId, ci.PartitionId FROM tbl_Container c INNER JOIN tbl_ContainerItem ci ON c.ContainerId = ci.ContainerId AND c.PartitionId = ci.PartitionId INNER JOIN tbl_FileReference fr ON ci.fileId = fr.fileId AND ci.DataspaceId = fr.DataspaceId AND ci.PartitionId = fr.PartitionId) c INNER JOIN tbl_FileMetadata fm ON fm.ResourceId = c.ResourceId AND fm.PartitionId = c.PartitionId GROUP BY c.Container ORDER BY TotalSizeInMB DESC
Are there other ways to clean and prepare your projects prior to upgrade or migration that I may have missed, leave a comment below!