Go to file
Kuang-che Wu 39ffd9977e sync: reduce multiprocessing serialization overhead
Background:
 - Manifest object is large (for projects like Android) in terms of
   serialization cost and size (more than 1mb).
 - Lots of Project objects usually share only a few manifest objects.

Before this CL, Project objects were passed to workers via function
parameters. Function parameters are pickled separately (in chunk). In
other words, manifests are serialized again and again. The major
serialization overhead of repo sync was
  O(manifest_size * projects / chunksize)

This CL uses following tricks to reduce serialization overhead.
 - All projects are pickled in one invocation. Because Project objects
   share manifests, pickle library remembers which objects are already
   seen and avoid the serialization cost.
 - Pass the Project objects to workers at worker intialization time.
   And pass project index as function parameters instead. The number of
   workers is much smaller than the number of projects.
 - Worker init state are shared on Linux (fork based). So it requires
   zero serialization for Project objects.

On Linux (fork based), the serialization overhead is
  O(projects)  --- one int per project
On Windows (spawn based), the serialization overhead is
  O(manifest_size * min(workers, projects))

Moreover, use chunksize=1 to avoid the chance that some workers are idle
while other workers still have more than one job in their chunk queue.

Using 2.7k projects as the baseline, originally "repo sync" no-op
sync takes 31s for fetch and 25s for checkout on my Linux workstation.
With this CL, it takes 12s for fetch and 1s for checkout.

Bug: b/371638995
Change-Id: Ifa22072ea54eacb4a5c525c050d84de371e87caa
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/439921
Tested-by: Kuang-che Wu <kcwu@google.com>
Reviewed-by: Josip Sokcevic <sokcevic@google.com>
Commit-Queue: Kuang-che Wu <kcwu@google.com>
2024-10-23 02:58:45 +00:00
.github/workflows tests: added python 3.12 2023-10-17 13:58:33 +00:00
docs git: raise soft version to 2.7.4 2024-03-20 21:11:26 +00:00
hooks release: update-hooks: helper for automatically syncing hooks 2024-04-23 18:31:51 +00:00
man init: add --manifest-upstream-branch 2024-09-26 00:52:28 +00:00
release release: update-hooks: helper for automatically syncing hooks 2024-04-23 18:31:51 +00:00
subcmds sync: reduce multiprocessing serialization overhead 2024-10-23 02:58:45 +00:00
tests sync: Always use WORKER_BATCH_SIZE 2024-10-07 18:44:19 +00:00
.flake8 flake8: exclude venv and .tox folder 2023-08-15 15:46:52 +00:00
.gitattributes Adds additional crlf clobber avoidance. 2016-06-22 08:36:45 +00:00
.gitignore Add parallelism to 'branches' command 2020-12-14 23:35:12 +00:00
.gitreview git-review: add config file 2021-11-15 01:39:36 +00:00
.isort.cfg isort: format codebase 2023-08-22 18:32:22 +00:00
.mailmap Update .mailmap 2020-02-13 04:49:55 +00:00
.project Set correct name in PyDev and Eclipse project config 2013-04-19 09:35:43 +09:00
.pydevproject Leverage the next keyword from python 2.7 2018-12-19 11:06:35 -08:00
color.py color: fix have_fg not re assign to true 2024-09-12 16:15:06 +00:00
command.py sync: reduce multiprocessing serialization overhead 2024-10-23 02:58:45 +00:00
completion.bash bash-completion: complete projects with repo forall 2021-07-27 06:20:52 +00:00
constraints.txt tox.ini, constraints.txt: Lock the version of black to <24 2024-09-12 16:05:35 +00:00
editor.py cleanup: Update codebase to expect Python 3.6 2023-10-31 16:03:54 +00:00
error.py project: Handle git sso auth failures as repo exit 2024-10-03 20:47:50 +00:00
event_log.py delete Python 2 (object) compat 2023-10-20 04:51:01 +00:00
fetch.py isort: format codebase 2023-08-22 18:32:22 +00:00
git_command.py Disable git terminal prompt during fetch/clone 2024-09-26 22:10:36 +00:00
git_config.py cleanup: Update codebase to expect Python 3.6 2023-10-31 16:03:54 +00:00
git_refs.py cleanup: convert exceptions to OSError 2023-10-21 00:56:10 +00:00
git_ssh add license header to a few more files 2019-06-13 13:23:19 -04:00
git_superproject.py superproject: Remove notice about beta 2024-10-03 20:37:18 +00:00
git_trace2_event_log_base.py git_trace2: Add socket timeout 2023-12-19 19:38:52 +00:00
git_trace2_event_log.py Log ErrorEvent for failing GitCommands 2023-09-06 18:22:33 +00:00
hooks.py cleanup: Update codebase to expect Python 3.6 2023-10-31 16:03:54 +00:00
LICENSE setup.py: add basic packaging files 2019-12-02 04:23:31 +00:00
main.py main: Stringify project name in error_info 2024-03-15 19:26:10 +00:00
manifest_xml.py gitc: delete a few more dead references 2024-04-18 02:30:06 +00:00
MANIFEST.in setup.py: add basic packaging files 2019-12-02 04:23:31 +00:00
pager.py isort: format codebase 2023-08-22 18:32:22 +00:00
platform_utils_win32.py cleanup: Update codebase to expect Python 3.6 2023-10-31 16:03:54 +00:00
platform_utils.py Remove platform_utils.realpath 2024-03-27 17:13:58 +00:00
progress.py cleanup: Update codebase to expect Python 3.6 2023-10-31 16:03:54 +00:00
project.py Fix incremental syncs for prjs with submodules 2024-10-18 03:55:10 +00:00
pyproject.toml tests: added python 3.12 2023-10-17 13:58:33 +00:00
README.md update links from monorail to issuetracker 2023-06-14 21:19:58 +00:00
repo init: add --manifest-upstream-branch 2024-09-26 00:52:28 +00:00
repo_logging.py logging: Fix log formatting with colored output 2024-07-02 06:24:31 +00:00
repo_trace.py cleanup: delete redundant "r" open mode 2023-10-21 00:55:33 +00:00
requirements.json git: raise hard version to 1.9.1 2024-05-01 15:23:50 +00:00
run_tests release: update-hooks: helper for automatically syncing hooks 2024-04-23 18:31:51 +00:00
run_tests.vpython3 isort: format codebase 2023-08-22 18:32:22 +00:00
setup.py isort: format codebase 2023-08-22 18:32:22 +00:00
ssh.py ssh: Set git protocol version 2 on SSH ControlMaster 2024-05-16 13:26:46 +00:00
SUBMITTING_PATCHES.md SUBMITTING_PATCHES: update with commit queue details 2023-05-11 19:27:57 +00:00
tox.ini tox.ini: Make the lint and format environments run black for all code 2024-09-12 16:09:24 +00:00
wrapper.py git_command: unify soft/hard versions with requirements.json 2024-03-21 21:20:50 +00:00

repo

Repo is a tool built on top of Git. Repo helps manage many Git repositories, does the uploads to revision control systems, and automates parts of the development workflow. Repo is not meant to replace Git, only to make it easier to work with Git. The repo command is an executable Python script that you can put anywhere in your path.

Contact

Please use the repo-discuss mailing list or issue tracker for questions.

You can file a new bug report under the "repo" component.

Please do not e-mail individual developers for support. They do not have the bandwidth for it, and often times questions have already been asked on repo-discuss or bugs posted to the issue tracker. So please search those sites first.

Install

Many distros include repo, so you might be able to install from there.

# Debian/Ubuntu.
$ sudo apt-get install repo

# Gentoo.
$ sudo emerge dev-vcs/repo

You can install it manually as well as it's a single script.

$ mkdir -p ~/.bin
$ PATH="${HOME}/.bin:${PATH}"
$ curl https://storage.googleapis.com/git-repo-downloads/repo > ~/.bin/repo
$ chmod a+rx ~/.bin/repo