When the d_YYYY_MM_DD
eups distrib tags were introduced last year, they were never intended to be long lived as there is no corresponding git tag. While they have been manually cleaned up once or twice, they had accumulated to the point that eups distrib install...
was spending an irritating amount of time downloading tags in serial. As of today, daily tag cleanup has been automated and are now retried by being moved into a sub-directory named old_tags
. Cleanup happens across all eups.lsst.codes
hosted eups package roots, including the “tarballs”.
That’s because of the robots.txt file. Use -U to avoid this (as I’ve said before).
This is the first I’ve heard of a robots.txt
in relation to eups distrib
but there is not, nor has there ever been, a robots.txt
on eups.lsst.codes
.
$ curl -I https://eups.lsst.codes/robots.txt
HTTP/1.1 404 Not Found
Server: nginx/1.13.3
Date: Wed, 02 May 2018 17:15:04 GMT
Content-Type: text/html; charset=iso-8859-1
Connection: keep-alive
$ curl -I https://eups.lsst.codes/stack/robots.txt
HTTP/1.1 404 Not Found
Server: nginx/1.13.3
Date: Wed, 02 May 2018 17:15:08 GMT
Content-Type: text/html; charset=iso-8859-1
Connection: keep-alive
$ curl -I https://eups.lsst.codes/stack/src/robots.txt
HTTP/1.1 404 Not Found
Server: nginx/1.13.3
Date: Wed, 02 May 2018 17:15:11 GMT
Content-Type: text/html; charset=iso-8859-1
Connection: keep-alive
$ curl -I https://eups.lsst.codes/stack/src/tags/robots.txt
HTTP/1.1 404 Not Found
Server: nginx/1.13.3
Date: Wed, 02 May 2018 17:15:14 GMT
Content-Type: text/html; charset=iso-8859-1
Connection: keep-alive
To quote from #dm-square on 2017-06-21:
Robert Lupton [1:48 PM]
Who can remove/fix the robots.txt file on `https://sw.lsstcorp.org`?
…
Joshua Hoblitt [1:57 PM]
@rhl that is ncsa infrastructure
So the problem was brought to square’s attention then.
I don’t recall that discussion. Looking at the history, there is no problem description or mention of -U
. Could you describe what problem -U
and/or removal of robots.txt
is supposed to resolve?
It was also made clear on 2017-06-21 that there is no robots.txt
on eups.lsst.codes
:
Joshua Hoblitt [10:36 AM]
@rhl I don't have administrative control of https://sw.lsstcorp.org/. I want to make https://eups.lsst.codes canonical but it looks like I failed to open an RFC on that.
there is no robots.txt on eups.lsst.codes
I think the problem is downloading large numbers of tag list files, and possibly doing that repeatedly during an installation. If the downloads were done using a mechanism that respected the robots.txt
Crawl-delay
directive and such a directive were to be present, it is possible that they could be very slow. However, it appears to me that the current code does not respect this directive, and the file does not exist anyway.
-U
avoids the problem by not attempting to apply the tags in the first place and thus not needing to download any tag list files.