Cleaning Unreal Engine git repo from unused assets
Problem
I was working on an Unreal Engine project which utilized the git version control system. In the beginning, I uploaded all kinds of assets, but in the end, only half of them were used. Hence the git repository contained a lot of unused stuff inside Content
directory. And so, I raised a question:
"How can I reduce the size of my project and git repository while keeping commits history intact?"
Solution
There are many approaches to solving this problem, but for the sake of jotting down this solution for my future self, this is the one I took:
First, I created a new Unreal Engine project and migrated the main map containing all assets, mechanics, etc. Right away, its size decreased by half. I manually copied the used source code (as it did not copy with the migration process). And then, I launched the project to see if everything was still working. Everything was correct.
Next, using WinMerge software, I compared an old directory of a project with the new one. It displayed all differences between given directories.
Then I generated a report file. Its form did not fit my needs, as I needed a simple list of paths that were NOT in the new directory. The raw version of the report looked like this:
Compare D:\git\OldProject\Content with D:\git\NewProject\Content\Content
10.01.2023 18:05:23
Filename,Folder,Comparison result,Left Date,Right Date,Extension
ThirdPerson,__ExternalActors__,Left only: D:\git\OldProject\Content\__ExternalActors__,* 10.01.2023 14:49:21, ,
Maps,__ExternalActors__\ThirdPerson,Left only: D:\git\OldProject\Content\__ExternalActors__\ThirdPerson,* 05.01.2023 10:50:52, ,
ThirdPersonMap,__ExternalActors__\ThirdPerson\Maps,Left only: D:\git\OldProject\Content\__ExternalActors__\ThirdPerson\Maps,* 10.01.2023 14:49:43, ,
MI_QuestMapProDemo_WoodPanel-Large.uasset,QuestMap\Demo\Materials,Left only: D:\git\OldProject\Content\QuestMap\Demo\Materials,* 05.01.2023 10:52:32, ,uasset
MI_QuestMapProDemo_WoodPanel-Small.uasset,QuestMap\Demo\Materials,Left only: D:\git\OldProject\Content\QuestMap\Demo\Materials,* 05.01.2023 10:52:32, ,uasset
Str_QuestMapPro_DirectionData.uasset,QuestMap\Structures,Binary files are different, 05.01.2023 10:52:32,* 10.01.2023 16:26:35,uasset
Str_QuestMapPro_LandmarkData.uasset,QuestMap\Structures,Binary files are different, 05.01.2023 10:52:32,* 10.01.2023 16:26:36,uasset
Str_QuestMapPro_MapVisibilityFlags.uasset,QuestMap\Structures,Binary files are different, 05.01.2023 10:52:32,* 10.01.2023 16:26:36,uasset
...
And so I wrote a short Python script to process this list giving me a proper list of file paths:
import os
with open("report.txt", "r") as file:
for line in file.readlines():
splitted = line.split(',')
ext = splitted[5].strip() # get only lines pointing to files, skip directories
not_in_new_project = splitted[4].strip() == ''
if ext and not_in_new_project:
with open("files_i_dont_want.txt", "a") as output:
output.write(f"Content/{os.path.join(splitted[1], splitted[0])}\n")
QuestMapPro.png
ToImport/Button.fbx
Platforms/HoloLens/Config/HoloLensEngine.ini
Content/__ExternalActors__/ThirdPerson/Maps/ThirdPersonMap/D/SY/Z4CTQ4LG3YV10EKPD0UE8Q.uasset
Content/__ExternalActors__/ThirdPerson/ThirdPersonMap/0/49/2CCD6BLWT4JNP9PXKWWIY7.uasset
Content/__ExternalActors__/ThirdPerson/ThirdPersonMap/1/29/JSFSL6YKZU8XHYQWSN320J.uasset
...
Next, I found a couple of scripts for cleaning git histories and removing unused files. In the end, the most practical turned out to be git-filter-repo, because it allowed feeding that tool with a file containing the list of files that needed to be deleted. And so, I fetched a fresh clone of my repository and copied it as a backup.
git clone git://example.com/some-big-repo.git
Next I run that script with following parameters:
python git-filter-repo.py --invert-paths --paths-from-file "files_i_dont_want.txt"
It parsed all commits and wrote new history while cleaning and repacking it out of unneeded objects. The whole operation took a fraction of a second. I could immediately see a difference in numbers of objects by looking at the console log:
Receiving objects: 100% (2314/2314), 2.11 MiB | 6.08 MiB/s, done.
Total 1327 (delta 126), reused 1327 (delta 126), pack-reused 0
Next, I run the following commands to clean old reflog entries and executed garbage collector to eradicate unneeded files:
git reflog expire --expire=now --all
git gc --prune=now --aggressive
In the end I pushed whole repository back to GitHub:
git remote add origin git://example.com/some-big-repo.git
git push --set-upstream -f origin main
Ultimately, my repository decreased in size by half, and everything worked perfectly.
Conclusions
The next time I need to perform the same operation, I will have its steps written down here. Who knows, maybe it will help you, too. And if you know a better way of comparing two directories without using WinMerge and Python, let me know in the comments below. Thanks!