To prepare your TFS Project Collection for migration, you may want to remove (stale) old data to reduce the database size first.
Delete old workspaces
Deleting workspaces and shelvesets can reduce your migration and upgrade times considerably. either use the
tf commandline or leverage a tool like the TFS SideKicks to identify and delete these.
You can list & clean up workspaces using the following commands:
tf vc workspaces /owner:* /computer:* /format:detailed
Then delete the workspaces from your server with:
tf vc workspace /delete
The same applies to shelvesets:
List all shelvesets:
tf vc shelvesets /format:detailed
Then delete a shelveset:
tf vc shelve /delete
If you want to script these commands using PowerShell, you can also use
/format:xmland parse the results.
Not just build results, but often overlooked the actual build records can take up a considerable amount of data. Use
tfsbuild destroy (XAML) to permanently delete the build records. In the past, I've encountered clients who had 1.8 million "hidden" builds in their database and removing them shaved off quite a considerable amount of data. These records were kept around for the warehouse.
In case you run into very slow execution of
tfsbuild, you may need this patched version it has a few fixes that prevent it rom trying to download all the build logs for every build in your system just to get to the build ids.
Old team projects
Of course, destroying old team projects can give back a lot of data. Anything you don't need to send to azure helps. You could also consider splitting the collection and to leave behind the old projects. That will give you the option to detach that collection and store it somewhere, should you ever need that data again.
Deleted branches are a very common hidden size hog. When deleting things in TFVC, they are not actually deleted, they're just hidden. Finding deleted files and especially old development or feature branches can give you back a lot of data. Use
tf destroy to get rid of them.
You may also want to look for checked in nuget package folders, those can quickly rack up a lot of space as well.
Code Lens index
Team Foundation Server 2013 introduced server side indexing of TFVC controlled files to allow Visual Studio access to data on who changed which files when directly in the UI. This server-side index can grow quite quickly depending on the size of your code base and churn.
You can control the index through the
tfsconfig codeindex command. You can specify up to how long ago you want to index
/indexHistoryPeriod:#months, delete the index altogether
/destroyCodeIndex or exclude specific problematic files
Code Lens is also referenced to as Code Sense and Code Index internally in the product.
In case deleting the index times out, you read this post on StackOverflow with additional guidance. Be careful, manually running SQL on the TFS collection DB's isn't supported.
Ohh yes, especially when you use test attachments, these can grow like crazy, depending on your TFS version either use the built-in test attachment cleanup features or use the Test Attachment Cleaner from the TFS power tools.
The build definitions themselves won't take a lot of database space, but the build results may. But those have been covered in a previous section.
In the past, I've had to patch tfbuid.exe to handle (very) large amounts of build records, as it tends to try and fetch all build data locally before proceeding with the delete action. You may need to rely on the TFS Client Object Model to achieve a similar result.
You may have data in your git repositories that are no longer accessible due to force pushes or deleted branches. It's also possible that certain data in Git could be packed more efficiently. To clean your repositories you have to clone them locally, clean them up, delete the remote repo from TFS and push the cleaned copy to a new repository (you can use the same name as the old one). Doing this will break references with existing build definitions and you will have to fix these up. While you're at it, you could also run the BFG repo Cleaner and convert the repositories to enable Git-LFS support to handle large binary files in your repositories more elegantly.
git clone --mirror https://tfs/project/repo # optionally run BFG repo cleaner at this point git reflog expire --expire=now --all git gc --prune=now --aggressive git repack -adf # Delete and recreate the remote repository with the same name git push origin --all git push origin --tags
Work item (attachments)
Work items can gather up a considerable amount of data, especially when people start attaching large attachments to them. You can use
witadmin destroywi to delete work items with unreasonably large attachments. To retain the work item, but delete its attachments you can delete the attachments from the current work item and then clone it. After cloning, destroy the old work item to allow the attachments to be cleaned up.
Old work items that you no longer need (say the sprint items from 6 years ago) can also be deleted. My colleague René has a nice tool that allows you to bulk-destroy by first creating the appropriate work item query.
The Nuget, npm and especially Universal Packages can take up quite a bit of space. In this seperate post I've shared a SQL stateent to figure out which feeds and packages take up the most space.
Be sure to run the cleanup jobs
TFS often doesn't immediately prune data from the database, in many cases, it just marks stuff as deleted for later processing. To force the cleanup to happen immediately, run the following stored procedures on your Project Collection database:
EXEC prc_CleanupDeletedFileContent 1 # You may have to run the following command multiple times, the last # parameter is the batch size, if there are more items to prune than the # passed in number, you will have to run it multiple times EXEC prc_DeleteUnusedFiles 1, 0, 100000
100000 at the end is the number of marked items to process. If you've deleted a lot of content you may need to run that last procedure a couple of times before everything will be deleted.
Other useful queries
To identify how much data is stored in each section, there are a few useful queries you can run. The actual query depends on your TFS version, but since you're preparing for migration I suspect you're on TFS 2017 or 2018 at the moment.
Find the largest tables:
SELECT TOP 10 o.name, SUM(reserved_page_count) * 8.0 / 1024 SizeInMB, SUM( CASE WHEN p.index_id <= 1 THEN p.row_count ELSE 0 END) Row_Count FROM sys.dm_db_partition_stats p JOIN sys.objects o ON p.object_id = o.object_id GROUP BY o.name ORDER BY SUM(reserved_page_count) DESC
Find the largest content contributors:
SELECT Owner = CASE WHEN OwnerId = 0 THEN 'Generic' WHEN OwnerId = 1 THEN 'VersionControl' WHEN OwnerId = 2 THEN 'WorkItemTracking' WHEN OwnerId = 3 THEN 'TeamBuild' WHEN OwnerId = 4 THEN 'TeamTest' WHEN OwnerId = 5 THEN 'Servicing' WHEN OwnerId = 6 THEN 'UnitTest' WHEN OwnerId = 7 THEN 'WebAccess' WHEN OwnerId = 8 THEN 'ProcessTemplate' WHEN OwnerId = 9 THEN 'StrongBox' WHEN OwnerId = 10 THEN 'FileContainer' WHEN OwnerId = 11 THEN 'CodeSense' WHEN OwnerId = 12 THEN 'Profile' WHEN OwnerId = 13 THEN 'Aad' WHEN OwnerId = 14 THEN 'Gallery' WHEN OwnerId = 15 THEN 'BlobStore' WHEN OwnerId = 255 THEN 'PendingDeletion' END, SUM(CompressedLength) / 1024.0 / 1024.0 AS BlobSizeInMB FROM tbl_FileReference AS r JOIN tbl_FileMetadata AS m ON r.ResourceId = m.ResourceId AND r.PartitionId = m.PartitionId WHERE r.PartitionId = 1 GROUP BY OwnerId ORDER BY 2 DESC
If file containers are the issue:
DECLARE @partitionId INT = 1 SELECT CASE WHEN ArtifactUri LIKE 'vstfs:///%' THEN SUBSTRING(ArtifactUri, 10, CHARINDEX('/', ArtifactUri, 10) - 10) ELSE ArtifactUri END, SUM(cast(ci.FileLength as decimal(38)))/1024.0/1024.0 AS SizeInMb, COUNT(*) AS Records, d.DataspaceIdentifier FROM tbl_Container c JOIN ( SELECT ci.FileId, ci.DataspaceId, MAX(ci.FileLength) AS FileLength, MAX(ci.ContainerId) AS ContainerId FROM tbl_ContainerItem ci WHERE ci.PartitionId = @partitionId GROUP BY ci.FileId, ci.DataspaceId ) AS ci ON ci.ContainerId = c.ContainerId JOIN tbl_Dataspace d ON d.DataspaceId = ci.DataspaceId WHERE c.PartitionId = @partitionId AND d.PartitionId = @partitionId GROUP BY CASE WHEN ArtifactUri LIKE 'vstfs:///%' THEN SUBSTRING(ArtifactUri, 10, CHARINDEX('/', ArtifactUri, 10) - 10) ELSE ArtifactUri END, d.DataspaceIdentifier ORDER BY SizeInMb DESC
Are there other ways to clean and prepare your projects prior to upgrade or migration that I may have missed, leave a comment below!