git-repo/docs/internal-fs-layout.md

263 lines
13 KiB
Markdown
Raw Permalink Normal View History

# Repo internal filesystem layout
A reference to the `.repo/` tree in repo client checkouts.
Hopefully it's complete & up-to-date, but who knows!
*** note
**Warning**:
This is meant for developers of the repo project itself as a quick reference.
**Nothing** in here must be construed as ABI, or that repo itself will never
change its internals in backwards incompatible ways.
***
[TOC]
## .repo/ layout
All content under `.repo/` is managed by `repo` itself with few exceptions.
In general, you should not make manual changes in here.
If a setting was initialized using an option to `repo init`, you should use that
command to change the setting later on.
It is always safe to re-run `repo init` in existing repo client checkouts.
For example, if you want to change the manifest branch, you can simply run
`repo init --manifest-branch=<new name>` and repo will take care of the rest.
* `config`: Per-repo client checkout settings using [git-config] file format.
* `.repo_config.json`: JSON cache of the `config` file for repo to
read/process quickly.
### repo/ state
* `repo/`: A git checkout of the repo project. This is how `repo` re-execs
itself to get the latest released version.
It tracks the git repository at `REPO_URL` using the `REPO_REV` branch.
Those are specified at `repo init` time using the `--repo-url=<REPO_URL>`
and `--repo-rev=<REPO_REV>` options.
Any changes made to this directory will usually be automatically discarded
by repo itself when it checks for updates. If you want to update to the
latest version of repo, use `repo selfupdate` instead. If you want to
change the git URL/branch that this tracks, re-run `repo init` with the new
settings.
* `.repo_fetchtimes.json`: Used by `repo sync` to record stats when syncing
the various projects.
### Manifests
For more documentation on the manifest format, including the local_manifests
support, see the [manifest-format.md] file.
* `manifests/`: A git checkout of the manifest project. Its `.git/` state
points to the `manifest.git` bare checkout (see below). It tracks the git
branch specified at `repo init` time via `--manifest-branch`.
The local branch name is always `default` regardless of the remote tracking
branch. Do not get confused if the remote branch is not `default`, or if
there is a remote `default` that is completely different!
No manual changes should be made in here as it will just confuse repo and
it won't automatically recover causing no new changes to be picked up.
* `manifests.git/`: A bare checkout of the manifest project. It tracks the
git repository specified at `repo init` time via `--manifest-url`.
No manual changes should be made in here as it will just confuse repo.
If you want to switch the tracking settings, re-run `repo init` with the
new settings.
* `manifest.xml`: The manifest that repo uses. It is generated at `repo init`
and uses the `--manifest-name` to determine what manifest file to load next
out of `manifests/`.
Do not try to modify this to load other manifests as it will confuse repo.
If you want to switch manifest files, re-run `repo init` with the new
setting.
Older versions of repo managed this with symlinks.
* `manifest.xml -> manifests/<manifest-name>.xml`: A symlink to the manifest
that the user wishes to sync. It is specified at `repo init` time via
`--manifest-name`.
* `manifests.git/.repo_config.json`: JSON cache of the `manifests.git/config`
file for repo to read/process quickly.
* `local_manifest.xml` (*Deprecated*): User-authored tweaks to the manifest
used to sync. See [local manifests] for more details.
* `local_manifests/`: Directory of user-authored manifest fragments to tweak
the manifest used to sync. See [local manifests] for more details.
### Project objects
*** note
**Warning**: Please do not use repo's approach to projects/ & project-objects/
layouts as a model for other tools to implement similar approaches.
It has a number of known downsides like:
* [Symlinks do not work well under Windows](./windows.md).
* Git sometimes replaces symlinks under .git/ with real files (under unknown
circumstances), and then the internal state gets out of sync, and data loss
may ensue.
* When sharing project-objects between multiple project checkouts, Git might
automatically run `gc` or `prune` which may lead to data loss or corruption
(since those operate on leaf projects and miss refs in other leaves). See
https://gerrit-review.googlesource.com/c/git-repo/+/254392 for more details.
Instead, you should use standard Git workflows like [git worktree] or
[gitsubmodules] with [superprojects].
***
* `copy-link-files.json`: Tracking file used by `repo sync` to determine when
copyfile or linkfile are added or removed and need corresponding updates.
* `project.list`: Tracking file used by `repo sync` to determine when projects
are added or removed and need corresponding updates in the checkout.
* `projects/`: Bare checkouts of every project synced by the manifest. The
filesystem layout matches the `<project path=...` setting in the manifest
(i.e. where it's checked out in the repo client source tree). Those
checkouts will symlink their `.git/` state to paths under here.
Some git state is further split out under `project-objects/`.
* `project-objects/`: Git objects that are safe to share across multiple
git checkouts. The filesystem layout matches the `<project name=...`
setting in the manifest (i.e. the path on the remote server) with a `.git`
suffix. This allows for multiple checkouts of the same remote git repo to
share their objects. For example, you could have different branches of
`foo/bar.git` checked out to `foo/bar-main`, `foo/bar-release`, etc...
There will be multiple trees under `projects/` for each one, but only one
under `project-objects/`.
This layout is designed to allow people to sync against different remotes
(e.g. a local mirror & a public review server) while avoiding duplicating
the content. However, this can run into problems if different remotes use
the same path on their respective servers. Best to avoid that.
* `subprojects/`: Like `projects/`, but for git submodules.
* `subproject-objects/`: Like `project-objects/`, but for git submodules.
* `worktrees/`: Bare checkouts of every project synced by the manifest. The
filesystem layout matches the `<project name=...` setting in the manifest
(i.e. the path on the remote server) with a `.git` suffix. This has the
same advantages as the `project-objects/` layout above.
This is used when [git worktree]'s are enabled.
### Global settings
The `.repo/manifests.git/config` file is used to track settings for the entire
repo client checkout.
sync: Added logging of repo sync state and config options for analysis. git_config.py: + Added SyncAnalysisState class, which saves the following data into the config object. ++ sys.argv, options, superproject's logging data. ++ repo.*, branch.* and remote.* parameters from config object. ++ current time as synctime. ++ Version number of the object. + All the keys for the above data are prepended with 'repo.syncstate.' + Added GetSyncAnalysisStateData and UpdateSyncAnalysisState methods to GitConfig object to save/get the above data. git_trace2_event_log.py: + Added LogConfigEvents method with code from DefParamRepoEvents to log events. sync.py: + superproject_logging_data is a dictionary that collects all the superproject data that is to be logged as trace2 event. + Sync at the end logs the previously saved syncstate.* parameters as previous_sync_state. Then it calls config's UpdateSyncAnalysisState to save and log all the current options, superproject logged data. docs/internal-fs-layout.md: + Added doc string explaining [repo.syncstate ...] sections of .repo/manifests.git/config file. test_git_config.py: + Added unit test for the new methods of GitConfig object. Tested: $ ./run_tests $ repo_dev init --use-superproject -u https://android.googlesource.com/platform/manifest Tested it by running the following command multiple times. $ repo_dev sync -j 20 repo sync has finished successfully Verified config file has [syncstate ...] data saved. Bug: [google internal] b/188573450 Change-Id: I1f914ce50f3382111b72940ca56de7c41b53d460 Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/313123 Tested-by: Raman Tenneti <rtenneti@google.com> Reviewed-by: Mike Frysinger <vapier@google.com> Reviewed-by: Xin Li <delphij@google.com>
2021-07-28 21:36:49 +00:00
Most settings use the `[repo]` section to avoid conflicts with git.
sync: Added logging of repo sync state and config options for analysis. git_config.py: + Added SyncAnalysisState class, which saves the following data into the config object. ++ sys.argv, options, superproject's logging data. ++ repo.*, branch.* and remote.* parameters from config object. ++ current time as synctime. ++ Version number of the object. + All the keys for the above data are prepended with 'repo.syncstate.' + Added GetSyncAnalysisStateData and UpdateSyncAnalysisState methods to GitConfig object to save/get the above data. git_trace2_event_log.py: + Added LogConfigEvents method with code from DefParamRepoEvents to log events. sync.py: + superproject_logging_data is a dictionary that collects all the superproject data that is to be logged as trace2 event. + Sync at the end logs the previously saved syncstate.* parameters as previous_sync_state. Then it calls config's UpdateSyncAnalysisState to save and log all the current options, superproject logged data. docs/internal-fs-layout.md: + Added doc string explaining [repo.syncstate ...] sections of .repo/manifests.git/config file. test_git_config.py: + Added unit test for the new methods of GitConfig object. Tested: $ ./run_tests $ repo_dev init --use-superproject -u https://android.googlesource.com/platform/manifest Tested it by running the following command multiple times. $ repo_dev sync -j 20 repo sync has finished successfully Verified config file has [syncstate ...] data saved. Bug: [google internal] b/188573450 Change-Id: I1f914ce50f3382111b72940ca56de7c41b53d460 Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/313123 Tested-by: Raman Tenneti <rtenneti@google.com> Reviewed-by: Mike Frysinger <vapier@google.com> Reviewed-by: Xin Li <delphij@google.com>
2021-07-28 21:36:49 +00:00
Everything under `[repo.syncstate.*]` is used to keep track of sync details for logging
purposes.
User controlled settings are initialized when running `repo init`.
init: Added --partial-clone-exclude option. partial-clone-exclude option excludes projects during partial clone. This is a comma-delimited project names (from manifest.xml). This option is persisted and it is used by the sync command. A project that has been unparital'ed will remain unpartial if that project's name is specified in the --partial-clone-exclude option. The project name should match exactly. Added $ ./run_tests -v Bug: [google internal] b/175712967 "I can't "unpartial" my androidx-main checkout" $ rm -rf androidx-main/ $ mkdir androidx-main/ $ cd androidx-main/ $ repo_dev init -u https://android.googlesource.com/platform/manifest -b androidx-main --partial-clone --clone-filter=blob:limit=10M -m default.xml $ repo_dev sync -c -j8 + Verify a project is partial $ cd frameworks/support/ $ git config -l | grep 'partial' + Unpartial a project. $ /google/bin/releases/android/git_repack/git_unpartial + Verify project is unpartial $ git config -l | grep 'partial' $ cd ../.. + Exclude the project from being unparial'ed after init and sync. $ repo_dev init -u https://android.googlesource.com/platform/manifest -b androidx-main --partial-clone --clone-filter=blob:limit=10M --partial-clone-exclude="platform/frameworks/support,platform/frameworks/support-golden" -m default.xml + Verify project is unpartial $ cd frameworks/support/ $ git config -l | grep 'partial' $ cd ../.. $ repo_dev sync -c -j8 $ cd frameworks/support/ $ git config -l | grep 'partial' $ cd ../.. + Remove the project from exclude list and verify that project is partially cloned. $ repo_dev init -u https://android.googlesource.com/platform/manifest -b androidx-main --partial-clone --clone-filter=blob:limit=10M --partial-clone-exclude= -m default.xml $ repo_dev sync -c -j8 $ cd frameworks/support/ $ git config -l | grep 'partial' Change-Id: Id5dba418eba1d3f54b54e826000406534c0ec196 Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/303162 Reviewed-by: Mike Frysinger <vapier@google.com> Tested-by: Raman Tenneti <rtenneti@google.com>
2021-04-13 03:57:25 +00:00
| Setting | `repo init` Option | Use/Meaning |
|------------------- |---------------------------|-------------|
| manifest.groups | `--groups` & `--platform` | The manifest groups to sync |
| repo.archive | `--archive` | Use `git archive` for checkouts |
| repo.clonebundle | `--clone-bundle` | Whether the initial sync used clone.bundle explicitly |
| repo.clonefilter | `--clone-filter` | Filter setting when using [partial git clones] |
| repo.depth | `--depth` | Create shallow checkouts when cloning |
| repo.dissociate | `--dissociate` | Dissociate from any reference/mirrors after initial clone |
| repo.mirror | `--mirror` | Checkout is a repo mirror |
| repo.partialclone | `--partial-clone` | Create [partial git clones] |
| repo.partialcloneexclude | `--partial-clone-exclude` | Comma-delimited list of project names (not paths) to exclude while using [partial git clones] |
| repo.reference | `--reference` | Reference repo client checkout |
| repo.submodules | `--submodules` | Sync git submodules |
| repo.superproject | `--use-superproject` | Sync [superproject] |
| repo.worktree | `--worktree` | Use [git worktree] for checkouts |
| user.email | `--config-name` | User's e-mail address; Copied into `.git/config` when checking out a new project |
| user.name | `--config-name` | User's name; Copied into `.git/config` when checking out a new project |
[partial git clones]: https://git-scm.com/docs/gitrepository-layout#_code_partialclone_code
init: added --use-superproject option to clone superproject. Added --no-use-superproject to repo and init.py to disable use of manifest superprojects. Replaced the term "sha" with "commit id". Added _GetBranch method to Superproject object. Moved shared code between init and sync into SyncSuperproject function. This function either does git clone or git fetch. If git fetch fails it does git clone. Changed Superproject constructor to accept manifest, repodir and branch to avoid passing them to multiple functions as argument. Changed functions that were raising exceptions to return either True or False. Saved the --use-superproject option in config as repo.superproject. Updated internal-fs-layout.md document. Updated the tests to work with the new API changes in Superproject. Performance for the first time sync has improved from 20 minutes to around 15 minutes. Tested the code with the following commands. $ ./run_tests -v Tested the sync code by using repo_dev alias and pointing to this CL. $ repo init took around 20 seconds longer because of cloning of superproject. $ time repo_dev init -u sso://android.git.corp.google.com/platform/manifest -b master --partial-clone --clone-filter=blob:limit=10M --repo-rev=main --use-superproject ... real 0m35.919s user 0m21.947s sys 0m8.977s First run $ time repo sync --use-superproject ... real 16m41.982s user 100m6.916s sys 19m18.753s No difference in repo sync time after the first run. Bug: [google internal] b/179090734 Bug: https://crbug.com/gerrit/13709 Bug: https://crbug.com/gerrit/13707 Change-Id: I12df92112f46e001dfbc6f12cd633c3a15cf924b Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/296382 Reviewed-by: Mike Frysinger <vapier@google.com> Tested-by: Raman Tenneti <rtenneti@google.com>
2021-02-09 08:26:31 +00:00
[superproject]: https://en.wikibooks.org/wiki/Git/Submodules_and_Superprojects
### Repo hooks settings
For more details on this feature, see the [repo-hooks docs](./repo-hooks.md).
We'll just discuss the internal configuration settings.
These are stored in the registered `<repo-hooks>` project itself, so if the
manifest switches to a different project, the settings will not be copied.
| Setting | Use/Meaning |
|--------------------------------------|-------------|
| repo.hooks.\<hook\>.approvedmanifest | User approval for secure manifest sources (e.g. https://) |
| repo.hooks.\<hook\>.approvedhash | User approval for insecure manifest sources (e.g. http://) |
For example, if our manifest had the following entries, we would store settings
under `.repo/projects/src/repohooks.git/config` (which would be reachable via
`git --git-dir=src/repohooks/.git config`).
```xml
<project path="src/repohooks" name="chromiumos/repohooks" ... />
<repo-hooks in-project="chromiumos/repohooks" ... />
```
If `<hook>` is `pre-upload`, the `.git/config` setting might be:
```ini
[repo "hooks.pre-upload"]
approvedmanifest = https://chromium.googlesource.com/chromiumos/manifest
```
## Per-project settings
These settings are somewhat meant to be tweaked by the user on a per-project
basis (e.g. `git config` in a checked out source repo).
Where possible, we re-use standard git settings to avoid confusion, and we
refrain from documenting those, so see [git-config] documentation instead.
See `repo help upload` for documentation on `[review]` settings.
The `[remote]` settings are automatically populated/updated from the manifest.
The `[branch]` settings are updated by `repo start` and `git branch`.
| Setting | Subcommands | Use/Meaning |
|-------------------------------|---------------|-------------|
| review.\<url\>.autocopy | upload | Automatically add to `--cc=<value>` |
| review.\<url\>.autoreviewer | upload | Automatically add to `--reviewers=<value>` |
| review.\<url\>.autoupload | upload | Automatically answer "yes" or "no" to all prompts |
| review.\<url\>.uploadhashtags | upload | Automatically add to `--hashtag=<value>` |
| review.\<url\>.uploadlabels | upload | Automatically add to `--label=<value>` |
| review.\<url\>.uploadnotify | upload | [Notify setting][upload-notify] to use |
| review.\<url\>.uploadtopic | upload | Default [topic] to use |
| review.\<url\>.username | upload | Override username with `ssh://` review URIs |
| remote.\<remote\>.fetch | sync | Set of refs to fetch |
| remote.\<remote\>.projectname | \<network\> | The name of the project as it exists in Gerrit review |
| remote.\<remote\>.pushurl | upload | The base URI for pushing CLs |
| remote.\<remote\>.review | upload | The URI of the Gerrit review server |
| remote.\<remote\>.url | sync & upload | The URI of the git project to fetch |
| branch.\<branch\>.merge | sync & upload | The branch to merge & upload & track |
| branch.\<branch\>.remote | sync & upload | The remote to track |
## ~/ dotconfig layout
Repo will create & maintain a few files in the user's home directory.
* `.repoconfig/`: Repo's per-user directory for all random config files/state.
* `.repoconfig/config`: Per-user settings using [git-config] file format.
* `.repoconfig/keyring-version`: Cache file for checking if the gnupg subdir
has all the same keys as the repo launcher. Used to avoid running gpg
constantly as that can be quite slow.
* `.repoconfig/gnupg/`: GnuPG's internal state directory used when repo needs
to run `gpg`. This provides isolation from the user's normal `~/.gnupg/`.
* `.repoconfig/.repo_config.json`: JSON cache of the `.repoconfig/config`
file for repo to read/process quickly.
* `.repo_.gitconfig.json`: JSON cache of the `.gitconfig` file for repo to
read/process quickly.
[git-config]: https://git-scm.com/docs/git-config
[git worktree]: https://git-scm.com/docs/git-worktree
[gitsubmodules]: https://git-scm.com/docs/gitsubmodules
[manifest-format.md]: ./manifest-format.md
[local manifests]: ./manifest-format.md#Local-Manifests
[superprojects]: https://en.wikibooks.org/wiki/Git/Submodules_and_Superprojects
[topic]: https://gerrit-review.googlesource.com/Documentation/intro-user.html#topics
[upload-notify]: https://gerrit-review.googlesource.com/Documentation/user-upload.html#notify