[{"data":1,"prerenderedAt":792},["ShallowReactive",2],{"/en-us/blog/scaling-the-gitlab-database":3,"navigation-en-us":37,"banner-en-us":437,"footer-en-us":447,"blog-post-authors-en-us-Yorick Peterse":689,"blog-related-posts-en-us-scaling-the-gitlab-database":703,"assessment-promotions-en-us":743,"next-steps-en-us":782},{"id":4,"title":5,"authorSlugs":6,"body":8,"categorySlug":9,"config":10,"content":14,"description":8,"extension":25,"isFeatured":12,"meta":26,"navigation":27,"path":28,"publishedDate":20,"seo":29,"stem":33,"tagSlugs":34,"__hash__":36},"blogPosts/en-us/blog/scaling-the-gitlab-database.yml","Scaling The Gitlab Database",[7],"yorick-peterse",null,"engineering",{"slug":11,"featured":12,"template":13},"scaling-the-gitlab-database",false,"BlogPost",{"title":15,"description":16,"authors":17,"heroImage":19,"date":20,"body":21,"category":9,"tags":22},"Scaling the GitLab database","An in-depth look at the challenges faced when scaling the GitLab database and the solutions we applied to help solve the problems with our database setup.",[18],"Yorick Peterse","https://res.cloudinary.com/about-gitlab-com/image/upload/v1749666699/Blog/Hero%20Images/banner.jpg","2017-10-02","For a long time GitLab.com used a single PostgreSQL database server and a single replica for disaster recovery purposes. This worked reasonably well for the first few years of GitLab.com's existence, but over time we began seeing more and more problems with this setup. In this article we'll take a look at what we did to help solve these problems for both GitLab.com and self-managed GitLab instances.\n\n\u003C!-- more -->\n\nFor example, the database was under constant pressure, with CPU utilization hovering around 70 percent almost all the time. Not because we used all available resources in the best way possible, but because we were bombarding the server with too many (badly optimized) queries. We realized we needed a better setup that would allow us to balance the load and make GitLab.com more resilient to any problems that may occur on the primary database server.\n\nWhen tackling these problems using PostgreSQL there are essentially four techniques you can apply:\n\n1. Optimize your application code so the queries are more efficient (and\n    ideally use fewer resources).\n2. Use a connection pooler to reduce the number of\n    database connections (and associated resources) necessary.\n3. Balance the load across multiple database servers.\n\n4. Shard your database.\n\nOptimizing the application code is something we have been working on actively for the past two years, but it's not a final solution. Even if you improve performance, when traffic also increases you may still need to apply the other two techniques. For the sake of this article we'll skip over this particular subject and instead focus on the other techniques.\n\n## Connection pooling\n\nIn PostgreSQL a connection is handled by starting an OS process which in turn needs a number of resources. The more connections (and thus processes), the more resources your database will use. PostgreSQL also enforces a maximum number of connections as defined in the [max_connections][max-connections] setting.\nOnce you hit this limit PostgreSQL will reject new connections. Such a setup can be illustrated using the following diagram:\n\n\n\n![PostgreSQL\nDiagram](https://about.gitlab.com/images/scaling-the-gitlab-database/postgresql.svg)\n\nHere our clients connect directly to PostgreSQL, thus requiring one connection per client.\n\nBy pooling connections we can have multiple client-side connections reuse\n\nPostgreSQL connections. For example, without pooling we'd need 100\nPostgreSQL connections to handle 100 client connections; with connection pooling we may only need 10 or so PostgreSQL connections depending on our configuration.\nThis means our connection diagram will instead look something like the following:\n\n\n\n![Connection Pooling\nDiagram](https://about.gitlab.com/images/scaling-the-gitlab-database/pooler.svg)\n\nHere we show an example where four clients connect to pgbouncer but instead of using four PostgreSQL connections we only need two of them.\n\nFor PostgreSQL there are two connection poolers that are most commonly used:\n\n* [pgbouncer][pgbouncer]\n\n* [pgpool-II][pgpool]\n\npgpool is a bit special because it does much more than just connection pooling:\n\nit has a built-in query caching mechanism, can balance load across multiple databases, manage replication, and more.\n\nOn the other hand pgbouncer is much simpler: all it does is connection pooling.\n\n## Database load balancing\n\nLoad balancing on the database level is typically done by making use of\n\nPostgreSQL's \"[hot standby][hot-standby]\" feature. A hot-standby is a\nPostgreSQL replica that allows you to run read-only SQL queries, contrary to a regular standby that does not allow any SQL queries to be executed. To balance load you'd set up one or more hot-standby servers and somehow balance read-only queries across these hosts while sending all other operations to the primary.\n\nScaling such a setup is fairly easy: simply add more hot-standby servers (if necessary) as your read-only traffic increases.\n\nAnother benefit of this approach is having a more resilient database cluster.\n\nWeb requests that only use a secondary can continue to operate even if the primary server is experiencing issues; though of course you may still run into errors should those requests end up using the primary.\n\nThis approach however can be quite difficult to implement. For example, explicit transactions must be executed on the primary since they may contain writes.\n\nFurthermore, after a write we want to continue using the primary for a little while because the changes may not yet be available on the hot-standby servers when using asynchronous replication.\n\n## Sharding\n\nSharding is the act of horizontally partitioning your data. This means that data resides on specific servers and is retrieved using a shard key. For example, you may partition data per project and use the project ID as the shard key.\nSharding a database is interesting when you have a very high write load (as there's no other easy way of balancing writes other than perhaps a multi-master setup), or when you have _a lot_ of data and you can no longer store it in a conventional manner (e.g. you simply can't fit it all on a single disk).\n\nUnfortunately the process of setting up a sharded database is a massive undertaking, even when using software such as [Citus][citus]. Not only do you need to set up the infrastructure (which varies in complexity depending on whether you run it yourself or use a hosted solution), but you also need to adjust large portions of your application to support sharding.\n\n### Cases against sharding\n\nOn GitLab.com the write load is typically very low, with most of the database queries being read-only queries. In very exceptional cases we may spike to 1500 tuple writes per second, but most of the time we barely make it past 200 tuple writes per second. On the other hand we can easily read up to 10 million tuples per second on any given secondary.\n\nStorage-wise, we also don't use that much data: only about 800 GB. A large portion of this data is data that is being migrated in the background. Once those migrations are done we expect our database to shrink in size quite a bit.\n\nThen there's the amount of work required to adjust the application so all queries use the right shard keys. While quite a few of our queries usually include a project ID which we could use as a shard key, there are also many queries where this isn't the case. Sharding would also affect the process of contributing changes to GitLab as every contributor would now have to make sure a shard key is present in their queries.\n\nFinally, there is the infrastructure that's necessary to make all of this work.\n\nServers have to be set up, monitoring has to be added, engineers have to be trained so they are familiar with this new setup, the list goes on. While hosted solutions may remove the need for managing your own servers it doesn't solve all problems. Engineers still have to be trained and (most likely very expensive)\n\nbills have to be paid. At GitLab we also highly prefer to ship the tools we need so the community can make use of them. This means that if we were going to shard the database we'd have to ship it (or at least parts of it) in our Omnibus packages. The only way you can make sure something you ship works is by running it yourself, meaning we wouldn't be able to use a hosted solution.\n\nUltimately we decided against sharding the database because we felt it was an expensive, time-consuming, and complex solution to a problem we do not have.\n\n## Connection pooling for GitLab\n\nFor connection pooling we had two main requirements:\n\n1. It has to work well (obviously).\n\n2. It has to be easy to ship in our Omnibus packages so our users can also\ntake advantage of the connection pooler.\n\nReviewing the two solutions (pgpool and pgbouncer) was done in two steps:\n\n1. Perform various technical tests (does it work, how easy is it to\nconfigure, etc).\n2. Find out what the experiences are of other users of the solution, what\n    problems they ran into and how they dealt with them, etc.\n\npgpool was the first solution we looked into, mostly because it seemed quite attractive based on all the features it offered. Some of the data from our tests can be found in [this][pgpool-comment-data] comment.\n\nUltimately we decided against using pgpool based on a number of factors. For example, pgpool does not support sticky connections. This is problematic when performing a write and (trying to) display the results right away. Imagine creating an issue and being redirected to the page, only to run into an HTTP 404 error because the server used for any read-only queries did not yet have the data. One way to work around this would be to use synchronous replication, but this brings many other problems to the table; problems we prefer to avoid.\n\nAnother problem is that pgpool's load balancing logic is decoupled from your application and operates by parsing SQL queries and sending them to the right server. Because this happens outside of your application you have very little control over which query runs where. This may actually be beneficial to some because you don't need additional application logic, but it also prevents you from adjusting the routing logic if necessary.\n\nConfiguring pgpool also proved quite difficult due to the sheer number of configuration options. Perhaps the final nail in the coffin was the feedback we got on pgpool from those having used it in the past. The feedback we received regarding pgpool was usually negative, though not very detailed in most cases.\n\nWhile most of the complaints appeared to be related to earlier versions of pgpool it still made us doubt if using it was the right choice.\n\nThe feedback combined with the issues described above ultimately led to us deciding against using pgpool and using pgbouncer instead. We performed a similar set of tests with pgbouncer and were very satisfied with it. It's fairly easy to configure (and doesn't have that much that needs configuring in the first place), relatively easy to ship, focuses only on connection pooling (and does it really well), and had very little (if any) noticeable overhead.\nPerhaps my only complaint would be that the pgbouncer website can be a little bit hard to navigate.\n\nUsing pgbouncer we were able to drop the number of active PostgreSQL connections from a few hundred to only 10-20 by using transaction pooling. We opted for using transaction pooling since Rails database connections are persistent.\nIn such a setup, using session pooling would prevent us from being able to reduce the number of PostgreSQL connections, thus brining few (if any) benefits. By using transaction pooling we were able to drop PostgreSQL's `max_connections` setting from 3000 (the reason for this particular value was never really clear)\n\nto 300. pgbouncer is configured in such a way that even at peak capacity we will only need 200 connections; giving us some room for additional connections such as `psql` consoles and maintenance tasks.\n\nA side effect of using transaction pooling is that you cannot use prepared statements, as the `PREPARE` and `EXECUTE` commands may end up running in different connections; producing errors as a result. Fortunately we did not measure any increase in response timings when disabling prepared statements, but we _did_ measure a reduction of roughly 20 GB in memory usage on our database servers.\n\nTo ensure both web requests and background jobs have connections available we set up two separate pools: one pool of 150 connections for background processing, and a pool of 50 connections for web requests. For web requests we rarely need more than 20 connections, but for background processing we can easily spike to a 100 connections simply due to the large number of background processes running on GitLab.com.\n\nToday we ship pgbouncer as part of GitLab EE's High Availability package.\nFor more information you can refer to [\"Omnibus GitLab PostgreSQL High Availability.\"][ha-docs]\n\n## Database load balancing for GitLab\n\nWith pgpool and its load balancing feature out of the picture we needed something else to spread load across multiple hot-standby servers.\n\nFor (but not limited to) Rails applications there is a library called [Makara][makara] which implements load balancing logic and includes a default implementation for ActiveRecord. Makara however has some problems that were a deal-breaker for us. For example, its support for sticky connections is very limited: when you perform a write the connection will stick to the primary using a cookie, with a fixed TTL. This means that if replication lag is greater than the TTL you may still end up running a query on a host that doesn't have the data you need.\n\nMakara also requires you to configure quite a lot, such as all the database hosts and their roles, with no service discovery mechanism (our current solution does not yet support this either, though it's planned for the near future).\nMakara also [does not appear to be thread-safe][makara-thread-safe], which is problematic since Sidekiq (the background processing system we use) is multi-threaded. Finally, we wanted to have control over the load balancing logic as much as possible.\n\nBesides Makara there's also [Octopus][octopus] which has some load balancing mechanisms built in. Octopus however is geared towards database sharding and not just balancing of read-only queries. As a result we did not consider using\n\nOctopus.\n\nUltimately this led to us building our own solution directly into GitLab EE.\n\nThe merge request adding the initial implementation can be found [here][lb-mr], though some changes, improvements, and fixes were applied later on.\n\nOur solution essentially works by replacing `ActiveRecord::Base.connection` with a proxy object that handles routing of queries. This ensures we can load balance as many queries as possible, even queries that don't originate directly from our own code. This proxy object in turn determines what host a query is sent to based on the methods called, removing the need for parsing SQL queries.\n\n### Sticky connections\n\nSticky connections are supported by storing a pointer to the current\nPostgreSQL\n\nWAL position the moment a write is performed. This pointer is then stored in\n\nRedis for a short duration at the end of a request. Each user is given their own key so that the actions of one user won't lead to all other users being affected. In the next request we get the pointer and compare this with all the secondaries. If all secondaries have a WAL pointer that exceeds our pointer we know they are in sync and we can safely use a secondary for our read-only queries. If one or more secondaries are not yet in sync we will continue using the primary until they are in sync. If no write is performed for 30 seconds and all the secondaries are still not in sync we'll revert to using the secondaries in order to prevent somebody from ending up running queries on the primary forever.\n\nChecking if a secondary has caught up is quite simple and is implemented in `Gitlab::Database::LoadBalancing::Host#caught_up?` as follows:\n\n```ruby\n\ndef caught_up?(location)\n  string = connection.quote(location)\n\n  query = \"SELECT NOT pg_is_in_recovery() OR \" \\\n    \"pg_xlog_location_diff(pg_last_xlog_replay_location(), #{string}) >= 0 AS result\"\n\n  row = connection.select_all(query).first\n\n  row && row['result'] == 't'\nensure\n  release_connection\nend\n\n```\n\nMost of the code here is standard Rails code to run raw queries and grab the results. The most interesting part is the query itself, which is as follows:\n\n```sql\n\nSELECT NOT pg_is_in_recovery()\n\nOR pg_xlog_location_diff(pg_last_xlog_replay_location(), WAL-POINTER) >= 0\nAS result\"\n\n```\n\nHere `WAL-POINTER` is the WAL pointer as returned by the PostgreSQL function `pg_current_xlog_insert_location()`, which is executed on the primary. In the above code snippet the pointer is passed as an argument, which is then quoted/escaped and passed to the query.\n\nUsing the function `pg_last_xlog_replay_location()` we can get the WAL pointer of a secondary, which we can then compare to our primary pointer using `pg_xlog_location_diff()`. If the result is greater than 0 we know the secondary is in sync.\n\nThe check `NOT pg_is_in_recovery()` is added to ensure the query won't fail when a secondary that we're checking was _just_ promoted to a primary and our\n\nGitLab process is not yet aware of this. In such a case we simply return `true` since the primary is always in sync with itself.\n\n### Background processing\n\nOur background processing code _always_ uses the primary since most of the work performed in the background consists of writes. Furthermore we can't reliably use a hot-standby as we have no way of knowing whether a job should use the primary or not as many jobs are not directly tied into a user.\n\n### Connection errors\n\nTo deal with connection errors our load balancer will not use a secondary if it is deemed to be offline, plus connection errors on any host (including the primary) will result in the load balancer retrying the operation a few times.\n\nThis ensures that we don't immediately display an error page in the event of a hiccup or a database failover. While we also deal with [hot standby conflicts][hot-standby-conflicts] on the load balancer level we ended up enabling `hot_standby_feedback` on our secondaries as doing so solved all hot-standby conflicts without having any negative impact on table bloat.\n\nThe procedure we use is quite simple: for a secondary we'll retry a few times with no delay in between. For a primary we'll retry the operation a few times using an exponential backoff.\n\nFor more information you can refer to the source code in GitLab EE:\n\n*\n\u003Chttps://gitlab.com/gitlab-org/gitlab-ee/tree/master/ee/lib/gitlab/database/load_balancing.rb>\n\n*\n\u003Chttps://gitlab.com/gitlab-org/gitlab-ee/tree/master/ee/lib/gitlab/database/load_balancing>\n\nDatabase load balancing was first introduced in GitLab 9.0 and _only_ supports\n\nPostgreSQL. More information can be found in the [9.0 release post][9-0-release] and the [documentation](https://docs.gitlab.com/ee/administration/postgresql/database_load_balancing.html).\n\n## Crunchy Data\n\nIn parallel to working on implementing connection pooling and load balancing we were working with [Crunchy Data][crunchy]. Until very recently I was the only [database specialist][database-specialist] which meant I had a lot of work on my plate. Furthermore my knowledge of PostgreSQL internals and its wide range of settings is limited (or at least was at the time), meaning there's only so much\n\nI could do. Because of this we hired Crunchy to help us out with identifying problems, investigating slow queries, proposing schema optimisations, optimising\n\nPostgreSQL settings, and much more.\n\nFor the duration of this cooperation most work was performed in confidential issues so we could share private data such as log files. With the cooperation coming to an end we have removed sensitive information from some of these issues and opened them up to the public. The primary issue was [gitlab-com/infrastructure#1448][issue-1448], which in turn led to many separate issues being created and resolved.\n\nThe benefit of this cooperation was immense as it helped us identify and solve many problems, something that would have taken me months to identify and solve if I had to do this all by myself.\n\nFortunately we recently managed to hire our [second database specialist][gstark] and we hope to grow the team more in the coming months.\n\n## Combining connection pooling and database load balancing\n\nCombining connection pooling and database load balancing allowed us to drastically reduce the number of resources necessary to run our database cluster as well as spread load across our hot-standby servers. For example, instead of our primary having a near constant CPU utilisation of 70 percent today it usually hovers between 10 percent and 20 percent, while our two hot-standby servers hover around 20 percent most of the time:\n\n![CPU\nPercentage](https://about.gitlab.com/images/scaling-the-gitlab-database/cpu-percentage.png)\n\nHere `db3.cluster.gitlab.com` is our primary while the other two hosts are our secondaries.\n\nOther load-related factors such as load averages, disk usage, and memory usage were also drastically improved. For example, instead of the primary having a load average of around 20 it barely goes above an average of 10:\n\n![CPU\nPercentage](https://about.gitlab.com/images/scaling-the-gitlab-database/load-averages.png)\n\nDuring the busiest hours our secondaries serve around 12 000 transactions per second (roughly 740 000 per minute), while the primary serves around 6 000 transactions per second (roughly 340 000 per minute):\n\n![Transactions Per\nSecond](https://about.gitlab.com/images/scaling-the-gitlab-database/transactions.png)\n\nUnfortunately we don't have any data on the transaction rates prior to deploying pgbouncer and our database load balancer.\n\nAn up-to-date overview of our PostgreSQL statistics can be found at our [public\n\nGrafana dashboard][postgres-stats].\n\nSome of the settings we have set for pgbouncer are as follows:\n\n| Setting              | Value       |\n|----------------------|-------------|\n| default_pool_size    | 100         |\n| reserve_pool_size    | 5           |\n| reserve_pool_timeout | 3           |\n| max_client_conn      | 2048        |\n| pool_mode            | transaction |\n| server_idle_timeout  | 30          |\n\nWith that all said there is still some work left to be done such as:\n\nimplementing service discovery ([#2042][issue-2042]), improving how we check if a secondary is available ([#2866][issue-2866]), and ignoring secondaries that are too far behind the primary ([#2197][issue-2197]).\n\nIt's worth mentioning that we currently do not have any plans of turning our load balancing solution into a standalone library that you can use outside of\n\nGitLab, instead our focus is on providing a solid load balancing solution for\n\nGitLab EE.\n\nIf this has gotten you interested and you enjoy working with databases, improving application performance, and adding database-related features to\n\nGitLab (such as [service discovery][issue-2042]) you should definitely check out the [job opening][job-opening] and the [database specialist handbook entry][database-specialist] for more information.\n\n- [Max Connections](https://www.postgresql.org/docs/9.6/static/runtime-config-connection.html#GUC-MAX-CONNECTIONS)\n- [PgBouncer](https://pgbouncer.github.io/)\n- [Pgpool](http://pgpool.net/mediawiki/index.php/Main_Page)\n- [Hot Standby](https://www.postgresql.org/docs/9.6/static/hot-standby.html)\n- [Pgpool Comment Data](https://gitlab.com/gitlab-com/infrastructure/issues/259#note_23464570)\n- [HA Docs](https://docs.gitlab.com/ee/administration/postgresql/index.html)\n- [Makara](https://github.com/taskrabbit/makara)\n- [Makara Thread Safety Issue](https://github.com/taskrabbit/makara/issues/151)\n- [Load Balancing MR](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1283)\n- [Issue #2042](https://gitlab.com/gitlab-org/gitlab-ee/issues/2042)\n- [Issue #2866](https://gitlab.com/gitlab-org/gitlab-ee/issues/2866)\n- [Issue #2197](https://gitlab.com/gitlab-org/gitlab-ee/issues/2197)\n- [GitLab 9.0 Release](/releases/2017/03/22/gitlab-9-0-released/)\n- [Load Balancing Docs](https://docs.gitlab.com/ee/administration/database_load_balancing.html)\n- [Postgres Stats Dashboard](https://dashboards.gitlab.com/dashboard/db/postgresql-overview?refresh=5m&orgId=1)\n- [Hot Standby Conflicts](https://www.postgresql.org/docs/current/static/hot-standby.html#HOT-STANDBY-CONFLICT)\n- [Citus](https://www.citusdata.com/)\n- [Octopus](https://github.com/thiagopradi/octopus)\n- [Crunchy Data](https://www.crunchydata.com/)\n- [Database Specialist Handbook](https://handbook.gitlab.com/handbook/engineering/infrastructure/database/)\n- [Database Engineer Job Opening](/job-families/engineering/database-engineer/)\n- [Issue #1448](https://gitlab.com/gitlab-com/infrastructure/issues/1448)\n- [Gstark](https://gitlab.com/_stark)",[23,24],"inside GitLab","production","yml",{},true,"/en-us/blog/scaling-the-gitlab-database",{"ogTitle":15,"ogImage":19,"ogDescription":16,"ogSiteName":30,"noIndex":12,"ogType":31,"ogUrl":32,"title":15,"canonicalUrls":32,"description":16},"https://about.gitlab.com","article","https://about.gitlab.com/blog/scaling-the-gitlab-database","en-us/blog/scaling-the-gitlab-database",[35,24],"inside-gitlab","BvD1D7a-8bmo7NIMRnO8GOpAh4WNWiG1r0GJz13UDYg",{"data":38},{"logo":39,"freeTrial":44,"sales":49,"login":54,"items":59,"search":367,"minimal":398,"duo":417,"pricingDeployment":427},{"config":40},{"href":41,"dataGaName":42,"dataGaLocation":43},"/","gitlab logo","header",{"text":45,"config":46},"Get free trial",{"href":47,"dataGaName":48,"dataGaLocation":43},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com&glm_content=default-saas-trial/","free trial",{"text":50,"config":51},"Talk to sales",{"href":52,"dataGaName":53,"dataGaLocation":43},"/sales/","sales",{"text":55,"config":56},"Sign in",{"href":57,"dataGaName":58,"dataGaLocation":43},"https://gitlab.com/users/sign_in/","sign in",[60,87,182,187,288,348],{"text":61,"config":62,"cards":64},"Platform",{"dataNavLevelOne":63},"platform",[65,71,79],{"title":61,"description":66,"link":67},"The intelligent orchestration platform for DevSecOps",{"text":68,"config":69},"Explore our Platform",{"href":70,"dataGaName":63,"dataGaLocation":43},"/platform/",{"title":72,"description":73,"link":74},"GitLab Duo Agent Platform","Agentic AI for the entire software lifecycle",{"text":75,"config":76},"Meet GitLab Duo",{"href":77,"dataGaName":78,"dataGaLocation":43},"/gitlab-duo-agent-platform/","gitlab duo agent platform",{"title":80,"description":81,"link":82},"Why GitLab","See the top reasons enterprises choose GitLab",{"text":83,"config":84},"Learn more",{"href":85,"dataGaName":86,"dataGaLocation":43},"/why-gitlab/","why gitlab",{"text":88,"left":27,"config":89,"link":91,"lists":95,"footer":164},"Product",{"dataNavLevelOne":90},"solutions",{"text":92,"config":93},"View all Solutions",{"href":94,"dataGaName":90,"dataGaLocation":43},"/solutions/",[96,120,143],{"title":97,"description":98,"link":99,"items":104},"Automation","CI/CD and automation to accelerate deployment",{"config":100},{"icon":101,"href":102,"dataGaName":103,"dataGaLocation":43},"AutomatedCodeAlt","/solutions/delivery-automation/","automated software delivery",[105,109,112,116],{"text":106,"config":107},"CI/CD",{"href":108,"dataGaLocation":43,"dataGaName":106},"/solutions/continuous-integration/",{"text":72,"config":110},{"href":77,"dataGaLocation":43,"dataGaName":111},"gitlab duo agent platform - product menu",{"text":113,"config":114},"Source Code Management",{"href":115,"dataGaLocation":43,"dataGaName":113},"/solutions/source-code-management/",{"text":117,"config":118},"Automated Software Delivery",{"href":102,"dataGaLocation":43,"dataGaName":119},"Automated software delivery",{"title":121,"description":122,"link":123,"items":128},"Security","Deliver code faster without compromising security",{"config":124},{"href":125,"dataGaName":126,"dataGaLocation":43,"icon":127},"/solutions/application-security-testing/","security and compliance","ShieldCheckLight",[129,133,138],{"text":130,"config":131},"Application Security Testing",{"href":125,"dataGaName":132,"dataGaLocation":43},"Application security testing",{"text":134,"config":135},"Software Supply Chain Security",{"href":136,"dataGaLocation":43,"dataGaName":137},"/solutions/supply-chain/","Software supply chain security",{"text":139,"config":140},"Software Compliance",{"href":141,"dataGaName":142,"dataGaLocation":43},"/solutions/software-compliance/","software compliance",{"title":144,"link":145,"items":150},"Measurement",{"config":146},{"icon":147,"href":148,"dataGaName":149,"dataGaLocation":43},"DigitalTransformation","/solutions/visibility-measurement/","visibility and measurement",[151,155,159],{"text":152,"config":153},"Visibility & Measurement",{"href":148,"dataGaLocation":43,"dataGaName":154},"Visibility and Measurement",{"text":156,"config":157},"Value Stream Management",{"href":158,"dataGaLocation":43,"dataGaName":156},"/solutions/value-stream-management/",{"text":160,"config":161},"Analytics & Insights",{"href":162,"dataGaLocation":43,"dataGaName":163},"/solutions/analytics-and-insights/","Analytics and insights",{"title":165,"items":166},"GitLab for",[167,172,177],{"text":168,"config":169},"Enterprise",{"href":170,"dataGaLocation":43,"dataGaName":171},"/enterprise/","enterprise",{"text":173,"config":174},"Small Business",{"href":175,"dataGaLocation":43,"dataGaName":176},"/small-business/","small business",{"text":178,"config":179},"Public Sector",{"href":180,"dataGaLocation":43,"dataGaName":181},"/solutions/public-sector/","public sector",{"text":183,"config":184},"Pricing",{"href":185,"dataGaName":186,"dataGaLocation":43,"dataNavLevelOne":186},"/pricing/","pricing",{"text":188,"config":189,"link":191,"lists":195,"feature":275},"Resources",{"dataNavLevelOne":190},"resources",{"text":192,"config":193},"View all resources",{"href":194,"dataGaName":190,"dataGaLocation":43},"/resources/",[196,229,247],{"title":197,"items":198},"Getting started",[199,204,209,214,219,224],{"text":200,"config":201},"Install",{"href":202,"dataGaName":203,"dataGaLocation":43},"/install/","install",{"text":205,"config":206},"Quick start guides",{"href":207,"dataGaName":208,"dataGaLocation":43},"/get-started/","quick setup checklists",{"text":210,"config":211},"Learn",{"href":212,"dataGaLocation":43,"dataGaName":213},"https://university.gitlab.com/","learn",{"text":215,"config":216},"Product documentation",{"href":217,"dataGaName":218,"dataGaLocation":43},"https://docs.gitlab.com/","product documentation",{"text":220,"config":221},"Best practice videos",{"href":222,"dataGaName":223,"dataGaLocation":43},"/getting-started-videos/","best practice videos",{"text":225,"config":226},"Integrations",{"href":227,"dataGaName":228,"dataGaLocation":43},"/integrations/","integrations",{"title":230,"items":231},"Discover",[232,237,242],{"text":233,"config":234},"Customer success stories",{"href":235,"dataGaName":236,"dataGaLocation":43},"/customers/","customer success stories",{"text":238,"config":239},"Blog",{"href":240,"dataGaName":241,"dataGaLocation":43},"/blog/","blog",{"text":243,"config":244},"Remote",{"href":245,"dataGaName":246,"dataGaLocation":43},"https://handbook.gitlab.com/handbook/company/culture/all-remote/","remote",{"title":248,"items":249},"Connect",[250,255,260,265,270],{"text":251,"config":252},"GitLab Services",{"href":253,"dataGaName":254,"dataGaLocation":43},"/services/","services",{"text":256,"config":257},"Community",{"href":258,"dataGaName":259,"dataGaLocation":43},"/community/","community",{"text":261,"config":262},"Forum",{"href":263,"dataGaName":264,"dataGaLocation":43},"https://forum.gitlab.com/","forum",{"text":266,"config":267},"Events",{"href":268,"dataGaName":269,"dataGaLocation":43},"/events/","events",{"text":271,"config":272},"Partners",{"href":273,"dataGaName":274,"dataGaLocation":43},"/partners/","partners",{"backgroundColor":276,"textColor":277,"text":278,"image":279,"link":283},"#2f2a6b","#fff","Insights for the future of software development",{"altText":280,"config":281},"the source promo card",{"src":282},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758208064/dzl0dbift9xdizyelkk4.svg",{"text":284,"config":285},"Read the latest",{"href":286,"dataGaName":287,"dataGaLocation":43},"/the-source/","the source",{"text":289,"config":290,"lists":292},"Company",{"dataNavLevelOne":291},"company",[293],{"items":294},[295,300,306,308,313,318,323,328,333,338,343],{"text":296,"config":297},"About",{"href":298,"dataGaName":299,"dataGaLocation":43},"/company/","about",{"text":301,"config":302,"footerGa":305},"Jobs",{"href":303,"dataGaName":304,"dataGaLocation":43},"/jobs/","jobs",{"dataGaName":304},{"text":266,"config":307},{"href":268,"dataGaName":269,"dataGaLocation":43},{"text":309,"config":310},"Leadership",{"href":311,"dataGaName":312,"dataGaLocation":43},"/company/team/e-group/","leadership",{"text":314,"config":315},"Team",{"href":316,"dataGaName":317,"dataGaLocation":43},"/company/team/","team",{"text":319,"config":320},"Handbook",{"href":321,"dataGaName":322,"dataGaLocation":43},"https://handbook.gitlab.com/","handbook",{"text":324,"config":325},"Investor relations",{"href":326,"dataGaName":327,"dataGaLocation":43},"https://ir.gitlab.com/","investor relations",{"text":329,"config":330},"Trust Center",{"href":331,"dataGaName":332,"dataGaLocation":43},"/security/","trust center",{"text":334,"config":335},"AI Transparency Center",{"href":336,"dataGaName":337,"dataGaLocation":43},"/ai-transparency-center/","ai transparency center",{"text":339,"config":340},"Newsletter",{"href":341,"dataGaName":342,"dataGaLocation":43},"/company/contact/#contact-forms","newsletter",{"text":344,"config":345},"Press",{"href":346,"dataGaName":347,"dataGaLocation":43},"/press/","press",{"text":349,"config":350,"lists":351},"Contact us",{"dataNavLevelOne":291},[352],{"items":353},[354,357,362],{"text":50,"config":355},{"href":52,"dataGaName":356,"dataGaLocation":43},"talk to sales",{"text":358,"config":359},"Support portal",{"href":360,"dataGaName":361,"dataGaLocation":43},"https://support.gitlab.com","support portal",{"text":363,"config":364},"Customer portal",{"href":365,"dataGaName":366,"dataGaLocation":43},"https://customers.gitlab.com/customers/sign_in/","customer portal",{"close":368,"login":369,"suggestions":376},"Close",{"text":370,"link":371},"To search repositories and projects, login to",{"text":372,"config":373},"gitlab.com",{"href":57,"dataGaName":374,"dataGaLocation":375},"search login","search",{"text":377,"default":378},"Suggestions",[379,381,385,387,391,395],{"text":72,"config":380},{"href":77,"dataGaName":72,"dataGaLocation":375},{"text":382,"config":383},"Code Suggestions (AI)",{"href":384,"dataGaName":382,"dataGaLocation":375},"/solutions/code-suggestions/",{"text":106,"config":386},{"href":108,"dataGaName":106,"dataGaLocation":375},{"text":388,"config":389},"GitLab on AWS",{"href":390,"dataGaName":388,"dataGaLocation":375},"/partners/technology-partners/aws/",{"text":392,"config":393},"GitLab on Google Cloud",{"href":394,"dataGaName":392,"dataGaLocation":375},"/partners/technology-partners/google-cloud-platform/",{"text":396,"config":397},"Why GitLab?",{"href":85,"dataGaName":396,"dataGaLocation":375},{"freeTrial":399,"mobileIcon":404,"desktopIcon":409,"secondaryButton":412},{"text":400,"config":401},"Start free trial",{"href":402,"dataGaName":48,"dataGaLocation":403},"https://gitlab.com/-/trials/new/","nav",{"altText":405,"config":406},"Gitlab Icon",{"src":407,"dataGaName":408,"dataGaLocation":403},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203874/jypbw1jx72aexsoohd7x.svg","gitlab icon",{"altText":405,"config":410},{"src":411,"dataGaName":408,"dataGaLocation":403},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203875/gs4c8p8opsgvflgkswz9.svg",{"text":413,"config":414},"Get Started",{"href":415,"dataGaName":416,"dataGaLocation":403},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com/get-started/","get started",{"freeTrial":418,"mobileIcon":423,"desktopIcon":425},{"text":419,"config":420},"Learn more about GitLab Duo",{"href":421,"dataGaName":422,"dataGaLocation":403},"/gitlab-duo/","gitlab duo",{"altText":405,"config":424},{"src":407,"dataGaName":408,"dataGaLocation":403},{"altText":405,"config":426},{"src":411,"dataGaName":408,"dataGaLocation":403},{"freeTrial":428,"mobileIcon":433,"desktopIcon":435},{"text":429,"config":430},"Back to pricing",{"href":185,"dataGaName":431,"dataGaLocation":403,"icon":432},"back to pricing","GoBack",{"altText":405,"config":434},{"src":407,"dataGaName":408,"dataGaLocation":403},{"altText":405,"config":436},{"src":411,"dataGaName":408,"dataGaLocation":403},{"title":438,"button":439,"config":444},"See how agentic AI transforms software delivery",{"text":440,"config":441},"Watch GitLab Transcend now",{"href":442,"dataGaName":443,"dataGaLocation":43},"/events/transcend/virtual/","transcend event",{"layout":445,"icon":446},"release","AiStar",{"data":448},{"text":449,"source":450,"edit":456,"contribute":461,"config":466,"items":471,"minimal":678},"Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license",{"text":451,"config":452},"View page source",{"href":453,"dataGaName":454,"dataGaLocation":455},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/","page source","footer",{"text":457,"config":458},"Edit this page",{"href":459,"dataGaName":460,"dataGaLocation":455},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/content/","web ide",{"text":462,"config":463},"Please contribute",{"href":464,"dataGaName":465,"dataGaLocation":455},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/CONTRIBUTING.md/","please contribute",{"twitter":467,"facebook":468,"youtube":469,"linkedin":470},"https://twitter.com/gitlab","https://www.facebook.com/gitlab","https://www.youtube.com/channel/UCnMGQ8QHMAnVIsI3xJrihhg","https://www.linkedin.com/company/gitlab-com",[472,519,573,617,644],{"title":183,"links":473,"subMenu":488},[474,478,483],{"text":475,"config":476},"View plans",{"href":185,"dataGaName":477,"dataGaLocation":455},"view plans",{"text":479,"config":480},"Why Premium?",{"href":481,"dataGaName":482,"dataGaLocation":455},"/pricing/premium/","why premium",{"text":484,"config":485},"Why Ultimate?",{"href":486,"dataGaName":487,"dataGaLocation":455},"/pricing/ultimate/","why ultimate",[489],{"title":490,"links":491},"Contact Us",[492,495,497,499,504,509,514],{"text":493,"config":494},"Contact sales",{"href":52,"dataGaName":53,"dataGaLocation":455},{"text":358,"config":496},{"href":360,"dataGaName":361,"dataGaLocation":455},{"text":363,"config":498},{"href":365,"dataGaName":366,"dataGaLocation":455},{"text":500,"config":501},"Status",{"href":502,"dataGaName":503,"dataGaLocation":455},"https://status.gitlab.com/","status",{"text":505,"config":506},"Terms of use",{"href":507,"dataGaName":508,"dataGaLocation":455},"/terms/","terms of use",{"text":510,"config":511},"Privacy statement",{"href":512,"dataGaName":513,"dataGaLocation":455},"/privacy/","privacy statement",{"text":515,"config":516},"Cookie preferences",{"dataGaName":517,"dataGaLocation":455,"id":518,"isOneTrustButton":27},"cookie preferences","ot-sdk-btn",{"title":88,"links":520,"subMenu":529},[521,525],{"text":522,"config":523},"DevSecOps platform",{"href":70,"dataGaName":524,"dataGaLocation":455},"devsecops platform",{"text":526,"config":527},"AI-Assisted Development",{"href":421,"dataGaName":528,"dataGaLocation":455},"ai-assisted development",[530],{"title":531,"links":532},"Topics",[533,538,543,548,553,558,563,568],{"text":534,"config":535},"CICD",{"href":536,"dataGaName":537,"dataGaLocation":455},"/topics/ci-cd/","cicd",{"text":539,"config":540},"GitOps",{"href":541,"dataGaName":542,"dataGaLocation":455},"/topics/gitops/","gitops",{"text":544,"config":545},"DevOps",{"href":546,"dataGaName":547,"dataGaLocation":455},"/topics/devops/","devops",{"text":549,"config":550},"Version Control",{"href":551,"dataGaName":552,"dataGaLocation":455},"/topics/version-control/","version control",{"text":554,"config":555},"DevSecOps",{"href":556,"dataGaName":557,"dataGaLocation":455},"/topics/devsecops/","devsecops",{"text":559,"config":560},"Cloud Native",{"href":561,"dataGaName":562,"dataGaLocation":455},"/topics/cloud-native/","cloud native",{"text":564,"config":565},"AI for Coding",{"href":566,"dataGaName":567,"dataGaLocation":455},"/topics/devops/ai-for-coding/","ai for coding",{"text":569,"config":570},"Agentic AI",{"href":571,"dataGaName":572,"dataGaLocation":455},"/topics/agentic-ai/","agentic ai",{"title":574,"links":575},"Solutions",[576,578,580,585,589,592,596,599,601,604,607,612],{"text":130,"config":577},{"href":125,"dataGaName":130,"dataGaLocation":455},{"text":119,"config":579},{"href":102,"dataGaName":103,"dataGaLocation":455},{"text":581,"config":582},"Agile development",{"href":583,"dataGaName":584,"dataGaLocation":455},"/solutions/agile-delivery/","agile delivery",{"text":586,"config":587},"SCM",{"href":115,"dataGaName":588,"dataGaLocation":455},"source code management",{"text":534,"config":590},{"href":108,"dataGaName":591,"dataGaLocation":455},"continuous integration & delivery",{"text":593,"config":594},"Value stream management",{"href":158,"dataGaName":595,"dataGaLocation":455},"value stream management",{"text":539,"config":597},{"href":598,"dataGaName":542,"dataGaLocation":455},"/solutions/gitops/",{"text":168,"config":600},{"href":170,"dataGaName":171,"dataGaLocation":455},{"text":602,"config":603},"Small business",{"href":175,"dataGaName":176,"dataGaLocation":455},{"text":605,"config":606},"Public sector",{"href":180,"dataGaName":181,"dataGaLocation":455},{"text":608,"config":609},"Education",{"href":610,"dataGaName":611,"dataGaLocation":455},"/solutions/education/","education",{"text":613,"config":614},"Financial services",{"href":615,"dataGaName":616,"dataGaLocation":455},"/solutions/finance/","financial services",{"title":188,"links":618},[619,621,623,625,628,630,632,634,636,638,640,642],{"text":200,"config":620},{"href":202,"dataGaName":203,"dataGaLocation":455},{"text":205,"config":622},{"href":207,"dataGaName":208,"dataGaLocation":455},{"text":210,"config":624},{"href":212,"dataGaName":213,"dataGaLocation":455},{"text":215,"config":626},{"href":217,"dataGaName":627,"dataGaLocation":455},"docs",{"text":238,"config":629},{"href":240,"dataGaName":241,"dataGaLocation":455},{"text":233,"config":631},{"href":235,"dataGaName":236,"dataGaLocation":455},{"text":243,"config":633},{"href":245,"dataGaName":246,"dataGaLocation":455},{"text":251,"config":635},{"href":253,"dataGaName":254,"dataGaLocation":455},{"text":256,"config":637},{"href":258,"dataGaName":259,"dataGaLocation":455},{"text":261,"config":639},{"href":263,"dataGaName":264,"dataGaLocation":455},{"text":266,"config":641},{"href":268,"dataGaName":269,"dataGaLocation":455},{"text":271,"config":643},{"href":273,"dataGaName":274,"dataGaLocation":455},{"title":289,"links":645},[646,648,650,652,654,656,658,662,667,669,671,673],{"text":296,"config":647},{"href":298,"dataGaName":291,"dataGaLocation":455},{"text":301,"config":649},{"href":303,"dataGaName":304,"dataGaLocation":455},{"text":309,"config":651},{"href":311,"dataGaName":312,"dataGaLocation":455},{"text":314,"config":653},{"href":316,"dataGaName":317,"dataGaLocation":455},{"text":319,"config":655},{"href":321,"dataGaName":322,"dataGaLocation":455},{"text":324,"config":657},{"href":326,"dataGaName":327,"dataGaLocation":455},{"text":659,"config":660},"Sustainability",{"href":661,"dataGaName":659,"dataGaLocation":455},"/sustainability/",{"text":663,"config":664},"Diversity, inclusion and belonging (DIB)",{"href":665,"dataGaName":666,"dataGaLocation":455},"/diversity-inclusion-belonging/","Diversity, inclusion and belonging",{"text":329,"config":668},{"href":331,"dataGaName":332,"dataGaLocation":455},{"text":339,"config":670},{"href":341,"dataGaName":342,"dataGaLocation":455},{"text":344,"config":672},{"href":346,"dataGaName":347,"dataGaLocation":455},{"text":674,"config":675},"Modern Slavery Transparency Statement",{"href":676,"dataGaName":677,"dataGaLocation":455},"https://handbook.gitlab.com/handbook/legal/modern-slavery-act-transparency-statement/","modern slavery transparency statement",{"items":679},[680,683,686],{"text":681,"config":682},"Terms",{"href":507,"dataGaName":508,"dataGaLocation":455},{"text":684,"config":685},"Cookies",{"dataGaName":517,"dataGaLocation":455,"id":518,"isOneTrustButton":27},{"text":687,"config":688},"Privacy",{"href":512,"dataGaName":513,"dataGaLocation":455},[690],{"id":691,"title":18,"body":8,"config":692,"content":694,"description":8,"extension":25,"meta":698,"navigation":27,"path":699,"seo":700,"stem":701,"__hash__":702},"blogAuthors/en-us/blog/authors/yorick-peterse.yml",{"template":693},"BlogAuthor",{"name":18,"config":695},{"headshot":696,"ctfId":697},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1749659488/Blog/Author%20Headshots/gitlab-logo-extra-whitespace.png","Yorick-Peterse",{},"/en-us/blog/authors/yorick-peterse",{},"en-us/blog/authors/yorick-peterse","VGbmFb88hdwYhr9lyGHjbwPUUHPQ6npgsw4txxlgB14",[704,719,732],{"content":705,"config":717},{"title":706,"description":707,"authors":708,"heroImage":710,"date":711,"body":712,"category":9,"tags":713},"How to use GitLab Container Virtual Registry with Docker Hardened Images","Learn how to simplify container image management with this step-by-step guide.",[709],"Tim Rizzi","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772111172/mwhgbjawn62kymfwrhle.png","2026-03-12","If you're a platform engineer, you've probably had this conversation:\n  \n*\"Security says we need to use hardened base images.\"*\n\n*\"Great, where do I configure credentials for yet another registry?\"*\n\n*\"Also, how do we make sure everyone actually uses them?\"*\n\nOr this one:\n\n*\"Why are our builds so slow?\"*\n\n*\"We're pulling the same 500MB image from Docker Hub in every single job.\"*\n\n*\"Can't we just cache these somewhere?\"*\n\nI've been working on [Container Virtual Registry](https://docs.gitlab.com/user/packages/virtual_registry/container/) at GitLab specifically to solve these problems. It's a pull-through cache that sits in front of your upstream registries — Docker Hub, dhi.io (Docker Hardened Images), MCR, and Quay — and gives your teams a single endpoint to pull from. Images get cached on the first pull. Subsequent pulls come from the cache. Your developers don't need to know or care which upstream a particular image came from.\n\nThis article shows you how to set up Container Virtual Registry, specifically with Docker Hardened Images in mind, since that's a combination that makes a lot of sense for teams concerned about security and not making their developers' lives harder.\n\n## What problem are we actually solving?\n\nThe Platform teams I usually talk to manage container images across three to five registries:\n\n* **Docker Hub** for most base images\n* **dhi.io** for Docker Hardened Images (security-conscious workloads)\n* **MCR** for .NET and Azure tooling\n* **Quay.io** for Red Hat ecosystem stuff\n* **Internal registries** for proprietary images\n\nEach one has its own:\n\n* Authentication mechanism\n* Network latency characteristics\n* Way of organizing image paths\n\nYour CI/CD configs end up littered with registry-specific logic. Credential management becomes a project unto itself. And every pipeline job pulls the same base images over the network, even though they haven't changed in weeks.\n\nContainer Virtual Registry consolidates this. One registry URL. One authentication flow (GitLab's). Cached images are served from GitLab's infrastructure rather than traversing the internet each time.\n\n## How it works\n\nThe model is straightforward:\n\n```text\nYour pipeline pulls:\n  gitlab.com/virtual_registries/container/1000016/python:3.13\n\nVirtual registry checks:\n  1. Do I have this cached? → Return it\n  2. No? → Fetch from upstream, cache it, return it\n\n```\n\nYou configure upstreams in priority order. When a pull request comes in, the virtual registry checks each upstream until it finds the image. The result gets cached for a configurable period (default 24 hours).\n\n```text\n┌─────────────────────────────────────────────────────────┐\n│                    CI/CD Pipeline                       │\n│                          │                              │\n│                          ▼                              │\n│   gitlab.com/virtual_registries/container/\u003Cid>/image   │\n└─────────────────────────────────────────────────────────┘\n                           │\n                           ▼\n┌─────────────────────────────────────────────────────────┐\n│            Container Virtual Registry                   │\n│                                                         │\n│  Upstream 1: Docker Hub ────────────────┐               │\n│  Upstream 2: dhi.io (Hardened) ────────┐│               │\n│  Upstream 3: MCR ─────────────────────┐││               │\n│  Upstream 4: Quay.io ────────────────┐│││               │\n│                                      ││││               │\n│                    ┌─────────────────┴┴┴┴──┐            │\n│                    │        Cache          │            │\n│                    │  (manifests + layers) │            │\n│                    └───────────────────────┘            │\n└─────────────────────────────────────────────────────────┘\n```\n\n## Why this matters for Docker Hardened Images\n\n[Docker Hardened Images](https://docs.docker.com/dhi/) are great because of the minimal attack surface, near-zero CVEs, proper software bills of materials (SBOMs), and SLSA provenance. If you're evaluating base images for security-sensitive workloads, they should be on your list.\n\nBut adopting them creates the same operational friction as any new registry:\n\n* **Credential distribution**: You need to get Docker credentials to every system that pulls images from dhi.io.\n* **CI/CD changes**: Every pipeline needs to be updated to authenticate with dhi.io.\n* **Developer friction**: People need to remember to use the hardened variants.\n* **Visibility gap**: It's difficulat to tell if teams are actually using hardened images vs. regular ones.\n\nVirtual registry addresses each of these:\n\n**Single credential**: Teams authenticate to GitLab. The virtual registry handles upstream authentication. You configure Docker credentials once, at the registry level, and they apply to all pulls.\n\n**No CI/CD changes per-team**: Point pipelines at your virtual registry. Done. The upstream configuration is centralized.\n\n**Gradual adoption**: Since images get cached with their full path, you can see in the cache what's being pulled. If someone's pulling `library/python:3.11` instead of the hardened variant, you'll know.\n\n**Audit trail**: The cache shows you exactly which images are in active use. Useful for compliance, useful for understanding what your fleet actually depends on.\n\n## Setting it up\n\nHere's a real setup using the Python client from this demo project.\n\n### Create the virtual registry\n\n```python\nfrom virtual_registry_client import VirtualRegistryClient\n\nclient = VirtualRegistryClient()\n\nregistry = client.create_virtual_registry(\n    group_id=\"785414\",  # Your top-level group ID\n    name=\"platform-images\",\n    description=\"Cached container images for platform teams\"\n)\n\nprint(f\"Registry ID: {registry['id']}\")\n# You'll need this ID for the pull URL\n```\n\n### Add Docker Hub as an upstream\n\nFor official images like Alpine, Python, etc.:\n\n```python\ndocker_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://registry-1.docker.io\",\n    name=\"Docker Hub\",\n    cache_validity_hours=24\n)\n```\n\n### Add Docker Hardened Images (dhi.io)\n\nDocker Hardened Images are hosted on `dhi.io`, a separate registry that requires authentication:\n\n```python\ndhi_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-docker-username\",\n    password=\"your-docker-access-token\",\n    cache_validity_hours=24\n)\n```\n\n### Add other upstreams\n\n```python\n# MCR for .NET teams\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://mcr.microsoft.com\",\n    name=\"Microsoft Container Registry\",\n    cache_validity_hours=48\n)\n\n# Quay for Red Hat stuff\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://quay.io\",\n    name=\"Quay.io\",\n    cache_validity_hours=24\n)\n```\n\n### Update your CI/CD\n\nHere's a `.gitlab-ci.yml` that pulls through the virtual registry:\n\n```yaml\nvariables:\n  VIRTUAL_REGISTRY_ID: \u003Cyour_virtual_registry_ID>\n\n  \nbuild:\n  image: docker:24\n  services:\n    - docker:24-dind\n  before_script:\n    # Authenticate to GitLab (which handles upstream auth for you)\n    - echo \"${CI_JOB_TOKEN}\" | docker login -u gitlab-ci-token --password-stdin gitlab.com\n  script:\n    # All of these go through your single virtual registry\n    \n    # Official Docker Hub images (use library/ prefix)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/library/alpine:latest\n    \n    # Docker Hardened Images from dhi.io (no prefix needed)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/python:3.13\n    \n    # .NET from MCR\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/dotnet/sdk:8.0\n```\n\n### Image path formats\n\nDifferent registries use different path conventions:\n\n| Registry | Pull URL Example |\n|----------|------------------|\n| Docker Hub (official) | `.../library/python:3.11-slim` |\n| Docker Hardened Images (dhi.io) | `.../python:3.13` |\n| MCR | `.../dotnet/sdk:8.0` |\n| Quay.io | `.../prometheus/prometheus:latest` |\n\n### Verify it's working\n\nAfter some pulls, check your cache:\n\n```python\nupstreams = client.list_registry_upstreams(registry['id'])\nfor upstream in upstreams:\n    entries = client.list_cache_entries(upstream['id'])\n    print(f\"{upstream['name']}: {len(entries)} cached entries\")\n\n```\n\n## What the numbers look like\n\nI ran tests pulling images through the virtual registry:\n\n| Metric | Without Cache | With Warm Cache |\n|--------|---------------|-----------------|\n| Pull time (Alpine) | 10.3s | 4.2s |\n| Pull time (Python 3.13 DHI) | 11.6s | ~4s |\n| Network roundtrips to upstream | Every pull | Cache misses only |\n\n\n\n\nThe first pull is the same speed (it has to fetch from upstream). Every pull after that, for the cache validity period, comes straight from GitLab's storage. No network hop to Docker Hub, dhi.io, MCR, or wherever the image lives.\n\nFor a team running hundreds of pipeline jobs per day, that's hours of cumulative build time saved.\n\n## Practical considerations\nHere are some considerations to keep in mind:\n\n### Cache validity\n\n24 hours is the default. For security-sensitive images where you want patches quickly, consider 12 hours or less:\n\n```python\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-username\",\n    password=\"your-token\",\n    cache_validity_hours=12\n)\n```\n\nFor stable, infrequently-updated images (like specific version tags), longer validity is fine.\n\n### Upstream priority\n\nUpstreams are checked in order. If you have images with the same name on different registries, the first matching upstream wins.\n\n### Limits\n\n* Maximum of 20 virtual registries per group\n* Maximum of 20 upstreams per virtual registry\n\n## Configuration via UI\n\nYou can also configure virtual registries and upstreams directly from the GitLab UI—no API calls required. Navigate to your group's **Settings > Packages and registries > Virtual Registry** to:\n\n* Create and manage virtual registries\n* Add, edit, and reorder upstream registries\n* View and manage the cache\n* Monitor which images are being pulled\n\n## What's next\n\nWe're actively developing:\n\n* **Allow/deny lists**: Use regex to control which images can be pulled from specific upstreams.\n\nThis is beta software. It works, people are using it in production, but we're still iterating based on feedback.\n\n## Share your feedback\n\nIf you're a platform engineer dealing with container registry sprawl, I'd like to understand your setup:\n\n* How many upstream registries are you managing?\n* What's your biggest pain point with the current state?\n* Would something like this help, and if not, what's missing?\n\nPlease share your experiences in the [Container Virtual Registry feedback issue](https://gitlab.com/gitlab-org/gitlab/-/work_items/589630).\n## Related resources\n- [New GitLab metrics and registry features help reduce CI/CD bottlenecks](https://about.gitlab.com/blog/new-gitlab-metrics-and-registry-features-help-reduce-ci-cd-bottlenecks/#container-virtual-registry)\n- [Container Virtual Registry documentation](https://docs.gitlab.com/user/packages/virtual_registry/container/)\n- [Container Virtual Registry API](https://docs.gitlab.com/api/container_virtual_registries/)",[714,715,716],"tutorial","product","features",{"featured":12,"template":13,"slug":718},"using-gitlab-container-virtual-registry-with-docker-hardened-images",{"content":720,"config":730},{"title":721,"description":722,"authors":723,"heroImage":725,"date":726,"category":9,"tags":727,"body":729},"How IIT Bombay students are coding the future with GitLab","At GitLab, we often talk about how software accelerates innovation. But sometimes, you have to step away from the Zoom calls and stand in a crowded university hall to remember why we do this.",[724],"Nick Veenhof","https://res.cloudinary.com/about-gitlab-com/image/upload/v1750099013/Blog/Hero%20Images/Blog/Hero%20Images/blog-image-template-1800x945%20%2814%29_6VTUA8mUhOZNDaRVNPeKwl_1750099012960.png","2026-01-08",[259,611,728],"open source","The GitLab team recently had the privilege of judging the **iHack Hackathon** at **IIT Bombay's E-Summit**. The energy was electric, the coffee was flowing, and the talent was undeniable. But what struck us most wasn't just the code — it was the sheer determination of students to solve real-world problems, often overcoming significant logistical and financial hurdles to simply be in the room.\n\n\nThrough our [GitLab for Education program](https://about.gitlab.com/solutions/education/), we aim to empower the next generation of developers with tools and opportunity. Here is a look at what the students built, and how they used GitLab to bridge the gap between idea and reality.\n\n## The challenge: Build faster, build securely\n\nThe premise for the GitLab track of the hackathon was simple: Don't just show us a product; show us how you built it. We wanted to see how students utilized GitLab's platform — from Issue Boards to CI/CD pipelines — to accelerate the development lifecycle.\n\nThe results were inspiring.\n\n## The winners\n\n### 1st place: Team Decode — Democratizing Scientific Research\n\n**Project:** FIRE (Fast Integrated Research Environment)\n\nTeam Decode took home the top prize with a solution that warms a developer's heart: a local-first, blazing-fast data processing tool built with [Rust](https://about.gitlab.com/blog/secure-rust-development-with-gitlab/) and Tauri. They identified a massive pain point for data science students: existing tools are fragmented, slow, and expensive.\n\nTheir solution, FIRE, allows researchers to visualize complex formats (like NetCDF) instantly. What impressed the judges most was their \"hacker\" ethos. They didn't just build a tool; they built it to be open and accessible.\n\n**How they used GitLab:** Since the team lived far apart, asynchronous communication was key. They utilized **GitLab Issue Boards** and **Milestones** to track progress and integrated their repo with Telegram to get real-time push notifications. As one team member noted, \"Coordinating all these technologies was really difficult, and what helped us was GitLab... the Issue Board really helped us track who was doing what.\"\n\n![Team Decode](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/epqazj1jc5c7zkgqun9h.jpg)\n\n### 2nd place: Team BichdeHueDost — Reuniting to Solve Payments\n\n**Project:** SemiPay (RFID Cashless Payment for Schools)\n\nThe team name, BichdeHueDost, translates to \"Friends who have been set apart.\" It's a fitting name for a group of friends who went to different colleges but reunited to build this project. They tackled a unique problem: handling cash in schools for young children. Their solution used RFID cards backed by a blockchain ledger to ensure secure, cashless transactions for students.\n\n**How they used GitLab:** They utilized [GitLab CI/CD](https://about.gitlab.com/topics/ci-cd/) to automate the build process for their Flutter application (APK), ensuring that every commit resulted in a testable artifact. This allowed them to iterate quickly despite the \"flaky\" nature of cross-platform mobile development.\n\n![Team BichdeHueDost](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/pkukrjgx2miukb6nrj5g.jpg)\n\n### 3rd place: Team ZenYukti — Agentic Repository Intelligence\n\n**Project:** RepoInsight AI (AI-powered, GitLab-native intelligence platform)\n\nTeam ZenYukti impressed us with a solution that tackles a universal developer pain point: understanding unfamiliar codebases. What stood out to the judges was the tool's practical approach to onboarding and code comprehension: RepoInsight-AI automatically generates documentation, visualizes repository structure, and even helps identify bugs, all while maintaining context about the entire codebase.\n\n**How they used GitLab:** The team built a comprehensive CI/CD pipeline that showcased GitLab's security and DevOps capabilities. They integrated [GitLab's Security Templates](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/ci/templates/Security) (SAST, Dependency Scanning, and Secret Detection), and utilized [GitLab Container Registry](https://docs.gitlab.com/user/packages/container_registry/) to manage their Docker images for backend and frontend components. They created an AI auto-review bot that runs on merge requests, demonstrating an \"agentic workflow\" where AI assists in the development process itself.\n\n![Team ZenYukti](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/ymlzqoruv5al1secatba.jpg)\n\n## Beyond the code: A lesson in inclusion\n\nWhile the code was impressive, the most powerful moment of the event happened away from the keyboard.\n\nDuring the feedback session, we learned about the journey Team ZenYukti took to get to Mumbai. They traveled over 24 hours, covering nearly 1,800 kilometers. Because flights were too expensive and trains were booked, they traveled in the \"General Coach,\" a non-reserved, severely overcrowded carriage.\n\nAs one student described it:\n\n*\"You cannot even imagine something like this... there are no seats... people sit on the top of the train. This is what we have endured.\"*\n\nThis hit home. [Diversity, Inclusion, and Belonging](https://handbook.gitlab.com/handbook/company/culture/inclusion/) are core values at GitLab. We realized that for these students, the barrier to entry wasn't intellect or skill, it was access.\n\nIn that moment, we decided to break that barrier. We committed to reimbursing the travel expenses for the participants who struggled to get there. It's a small step, but it underlines a massive truth: **talent is distributed equally, but opportunity is not.**\n\n![hackathon class together](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380252/o5aqmboquz8ehusxvgom.jpg)\n\n### The future is bright (and automated)\n\nWe also saw incredible potential in teams like Prometheus, who attempted to build an autonomous patch remediation tool (DevGuardian), and Team Arrakis, who built a voice-first job portal for blue-collar workers using [GitLab Duo](https://about.gitlab.com/gitlab-duo/) to troubleshoot their pipelines.\n\nTo all the students who participated: You are the future. Through [GitLab for Education](https://about.gitlab.com/solutions/education/), we are committed to providing you with the top-tier tools (like GitLab Ultimate) you need to learn, collaborate, and change the world — whether you are coding from a dorm room, a lab, or a train carriage. **Keep shipping.**\n\n> :bulb: Learn more about the [GitLab for Education program](https://about.gitlab.com/solutions/education/).\n",{"slug":731,"featured":12,"template":13},"how-iit-bombay-students-code-future-with-gitlab",{"content":733,"config":741},{"title":734,"description":735,"authors":736,"heroImage":737,"date":738,"category":9,"tags":739,"body":740},"Artois University elevates research and curriculum with GitLab Ultimate for Education","Artois University's CRIL leveraged the GitLab for Education program to gain free access to Ultimate, transforming advanced research and computer science curricula.",[724],"https://res.cloudinary.com/about-gitlab-com/image/upload/v1750099203/Blog/Hero%20Images/Blog/Hero%20Images/blog-image-template-1800x945%20%2820%29_2bJGC5ZP3WheoqzlLT05C5_1750099203484.png","2025-12-10",[611,259,715],"Leading academic institutions face a critical challenge: how to provide thousands of students and researchers with industry-standard, **full-featured DevSecOps tools** without compromising institutional control. Many start with basic version control, but the modern curriculum demands integrated capabilities for planning, security, and advanced CI/CD.\n\nThe **GitLab for Education program** is designed to solve this by providing access to **GitLab Ultimate** for qualifying institutions, allowing them to scale their operations and elevate their academic offerings. \n\nThis article showcases a powerful success story from the **Centre de Recherche en Informatique de Lens (CRIL)**, a joint laboratory of **Artois University** and CNRS in France. After years of relying solely on GitLab Community Edition (CE), the university's move to GitLab Ultimate through the GitLab for Education program immediately unlocked advanced capabilities, transforming their teaching, research, and contribution workflows virtually overnight. This story demonstrates why GitLab Ultimate is essential for institutions seeking to deliver advanced computer science and research curricula.\n\n## GitLab Ultimate unlocked: Managing scale and driving academic value\n\n**Artois University's** self-managed GitLab instance is a large-scale operation, supporting nearly **3,000 users** across approximately **19,000 projects**, primarily serving computer science students and researchers. While GitLab Community Edition was robust, the upgrade to GitLab Ultimate provided the sophisticated tooling necessary for managing this scale and facilitating advanced university-level work.\n\n***\"We can see the difference,\" says Daniel Le Berre, head of research at CRIL and the instance maintainer. \"It's a completely different product. Each week reveals new features that directly enhance our productivity and teaching.\"***\n\nThe institution joined the GitLab for Education program specifically because it covers both **instructional and non-commercial research use cases** and offers full access to Ultimate's features, removing significant cost barriers.\n\n### Key GitLab Ultimate benefits for students and researchers\n\n* **Advanced project management at scale:** Master's students now benefit from **GitLab Ultimate's project planning features**. This enables them to structure, track, and manage complex, long-term research projects using professional methodologies like portfolio management and advanced issue tracking that seamlessly roll up across their thousands of projects.\n\n* **Enhanced visibility:** Features like improved dashboards and code previews directly in Markdown files dramatically streamline tracking and documentation review, reducing administrative friction for both instructors and students managing large project loads.\n\n## Comprehensive curriculum: From concepts to continuous delivery\n\nGitLab Ultimate is deeply integrated into the computer science curriculum, moving students beyond simple `git` commands to practical **DevSecOps implementation**.\n\n* **Git fundamentals:** Students begin by visualizing concepts using open-source tools to master Git concepts.\n\n* **Full CI/CD implementation:** Students use GitLab CI for rigorous **Test-Driven Development (TDD)** in their software projects. They learn to build, test, and perform quality assurance using unit and integration testing pipelines—core competency made seamless by the integrated platform.\n\n* **DevSecOps for research and documentation:** The university teaches students that DevSecOps principles are vital for all collaborative work. Inspired by earlier work in Delft, students manage and produce critical research documentation (PDFs from Markdown files) using GitLab, incorporating quality checks like linters and spell checks directly in the CI pipeline. This ensures high-quality, reproducible research output.\n\n* **Future-proofing security skills:** The GitLab Ultimate platform immediately positions the institution to incorporate advanced DevSecOps features like SAST and DAST scanning as their research and development code projects grow, ensuring students are prepared for industry security standards.\n\n## Accelerating open source contributions with GitLab Duo\n\nAccess to the full GitLab platform, including our AI capabilities, has empowered students to make impactful contributions to the wider open source community faster than ever before.\n\nTwo Master's students recently completed direct contributions to the GitLab product, adding the **ORCID identifier** into user profiles. Working on GitLab.com, they leveraged **GitLab Duo's AI chat and code suggestions** to navigate the codebase efficiently.\n\n***\"This would not have been possible without GitLab Duo,\" Daniel Le Berre notes. \"The AI features helped students, who might have lacked deep codebase knowledge, deliver meaningful contributions in just two weeks.\"***\n\nThis demonstrates how providing students with cutting-edge tools **accelerates their learning and impact**, allowing them to translate classroom knowledge into real-world contributions immediately.\n\n## Empowering open research and institutional control\n\nThe stability of the self-managed instance at Artois University is key to its success. This model guarantees **institutional control and stability** — a critical factor for long-term research preservation.\n\nThe institution's expertise in this area was recently highlighted in a major 2024 study led by CRIL, titled: \"[Higher Education and Research Forges in France - Definition, uses, limitations encountered and needs analysis](https://hal.science/hal-04208924v4)\" ([Project on GitLab](https://gitlab.in2p3.fr/coso-college-codes-sources-et-logiciels/forges-esr-en)). The research found that the vast majority of public forges in French Higher Education and Research relied on **GitLab**. This finding underscores the consensus among academic leaders that self-hosted solutions are essential for **data control and longevity**, especially when compared to relying on external, commercial forges.\n\n## Unlock GitLab Ultimate for your institution today\n\nThe success story of **Artois University's CRIL** proves the transformative power of the GitLab for Education program. By providing **free access to GitLab Ultimate**, we enable large-scale institutions to:\n\n1.  **Deliver a modern, integrated DevSecOps curriculum.**\n\n2.  **Support advanced, collaborative research projects with Ultimate planning features.**\n\n3.  **Empower students to make AI-assisted open source contributions.**\n\n4.  **Maintain institutional control and data longevity.**\n\nIf your academic institution is ready to equip its students and researchers with the complete DevSecOps platform and its most advanced features, we invite you to join the program.\n\nThe program provides **free access to GitLab Ultimate** for qualifying instructional and non-commercial research use cases.\n\n**Apply now [online](https://about.gitlab.com/solutions/education/join/).**\n",{"slug":742,"featured":27,"template":13},"artois-university-elevates-curriculum-with-gitlab-ultimate-for-education",{"promotions":744},[745,759,770],{"id":746,"categories":747,"header":749,"text":750,"button":751,"image":756},"ai-modernization",[748],"ai-ml","Is AI achieving its promise at scale?","Quiz will take 5 minutes or less",{"text":752,"config":753},"Get your AI maturity score",{"href":754,"dataGaName":755,"dataGaLocation":241},"/assessments/ai-modernization-assessment/","modernization assessment",{"config":757},{"src":758},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/qix0m7kwnd8x2fh1zq49.png",{"id":760,"categories":761,"header":762,"text":750,"button":763,"image":767},"devops-modernization",[715,557],"Are you just managing tools or shipping innovation?",{"text":764,"config":765},"Get your DevOps maturity score",{"href":766,"dataGaName":755,"dataGaLocation":241},"/assessments/devops-modernization-assessment/",{"config":768},{"src":769},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138785/eg818fmakweyuznttgid.png",{"id":771,"categories":772,"header":774,"text":750,"button":775,"image":779},"security-modernization",[773],"security","Are you trading speed for security?",{"text":776,"config":777},"Get your security maturity score",{"href":778,"dataGaName":755,"dataGaLocation":241},"/assessments/security-modernization-assessment/",{"config":780},{"src":781},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/p4pbqd9nnjejg5ds6mdk.png",{"header":783,"blurb":784,"button":785,"secondaryButton":790},"Start building faster today","See what your team can do with the intelligent orchestration platform for DevSecOps.\n",{"text":786,"config":787},"Get your free trial",{"href":788,"dataGaName":48,"dataGaLocation":789},"https://gitlab.com/-/trial_registrations/new?glm_content=default-saas-trial&glm_source=about.gitlab.com/","feature",{"text":493,"config":791},{"href":52,"dataGaName":53,"dataGaLocation":789},1773350822821]