changeset 654:f221c7b5bdfb

git_handler: use convert_list to cache git objects getnewgitcommits() does a weird traversal where a particular commit SHA is visited as many times as the number of parents it has, effectively doubling object reads in the standard case with one parent. This patch makes the convert_list a cache for objects, so that a particular Git object is read just once. On a mostly linear repository with over 50,000 commits, this brings a no-op hg pull down from 70 seconds to 38, which is close to half the time, as expected. Note that even a no-op hg pull currently does a full DAG traversal -- an upcoming patch will fix this.
author Siddharth Agarwal <sid0@fb.com>
date Tue, 18 Feb 2014 20:22:13 -0800
parents 4ab616864329
children baba2cf03d41
files hggit/git_handler.py
diffstat 1 files changed, 5 insertions(+), 2 deletions(-) [+]
line wrap: on
line diff
--- a/hggit/git_handler.py
+++ b/hggit/git_handler.py
@@ -620,7 +620,11 @@
                 todo.pop()
                 continue
             assert isinstance(sha, str)
-            obj = self.git.get_object(sha)
+            if sha in convert_list:
+                obj = convert_list[sha]
+            else:
+                obj = self.git.get_object(sha)
+                convert_list[sha] = obj
             assert isinstance(obj, Commit)
             for p in obj.parents:
                 if p not in done:
@@ -630,7 +634,6 @@
                     break
             else:
                 commits.append(sha)
-                convert_list[sha] = obj
                 done.add(sha)
                 todo.pop()