@mgifford This issue is separate from what you are describing. Separate enough it should be discussed in a separate place.
@catch Let me clarify my 3rd bullet in #37. When we create hashes and store them in the database we shouldn't so that forever. If they are in the variable table or in their own table this is space we don't need to keep filling up. We need a method to remove stale hash entries from the database.
My suggestion (and there may likely be a better way) is to remove the hash data from the database when drupal_clear_css_cache or drupal_clear_js_cache are run and use a similar logic method as drupal_delete_file_if_stale to decide if an item is stale and should be removed from the database.
How did you plan on removing stale hash entries?
@catch thanks for the detailed explanation. It helped clarify where you come from and I like it. So, here is what I see:
- drupal_get_css() stores the hash and what files are included + inserts script tag into page.
- When the js/css file is generated it looks up the files from the hash in the db and generates the file. Future requests will hit the file instead.
- For cleanup, when the css/js files are deleted after 30 days (or other set time) the record in the database is removed as well.
Does this sound about right?
@pounard Form tokens don't stop a ddos. If you want to stop a ddos get the IP (or range) doing the damage and block it at a level much lower than the application.
Maybe we look into a method using the most recent time a file was updated as part of the process. A file to be aggregated that is.
When the hash is created it could be files + unique site key + query string (updated on each cache clear). That would be easy to block from people trying to use the combiner from the outside and work with reverse proxies (I think).
When you update production you should NOT have trouble in your reverse proxy. We need to serve both the small and large clients in this.
Creating the hash would use the files plus some "predictable details". This could include elements like a private site key and css_js_query_string. I'm not entirely sure what would be appropriate here as we need to explore a little more.
If we did something like:
<script src="http://example.com/files/js/js_hash12342341234234?files=misc/jquery.js,misc/jquery.once.js,misc/drupal.js"></script>
We could generate the hash based on the file names + predictable details for the script tag. Then, when we go to generate the file we check the passed in hash against a new hash created based on the name + predictable details. If they match we generate the file. If not, we don't. This would stop misuse of this combiner.
A method like this could be used to remove the db lookup and some of the noted race conditions. The only race condition I currently see would be the file existing/generation between multiple requests at the same time.
Thoughts?
When the function that creates the script tag is called it could write something to a table that contains a list of the files for that hash. It would be a really fast lookup. But, how do you keep that table cleaned up? If you just write to if forever it could grow to be big. You can't truncate it like a cache table because there could be a cached page (in a browser) that tries pulling to a file that no longer exists and we don't have the data to generate.
Basically, there are multiple race conditions that can occur (one is even noted in the agrcache code comments). We should build a sub-system without the race conditions.
reposting over here....
I was thinking a little further on #1. If we take a look at what espn.go.com does we see something like:
<link rel="stylesheet" charset="utf-8" media="screen" href="http://a.espncdn.com/combiner/c?v=201010121210&css=global_reset.r1.css,base.r228.css,modules.r424.css,global_header.r38.css,modules/global_nav.r47.css,/espn/espn/styles/frontpage_scoreboard_10,modules/insider_enhanced.200910131831.css,sn_icon_sprite.200907150955.css,mem2010/mem.r5.css,mem2010/mem_espn360.r2.css,universal_overlay/universal_overlay.css,universal_overlay/media_overlay.css,universal_overlay/video_overlay.css,universal_overlay/photo_overlay.css,universal_overlay/dyk_overlay.css,warvertical12a.css,twin.css" />
...
<script src="http://a.espncdn.com/combiner/c?v=201012011221&js=jquery-1.4.2.1.js,plugins/json2.r3.js,plugins/teacrypt.js,plugins/jquery.metadata.js,plugins/jquery.bgiframe.js,plugins/jquery.easing.1.3.js,plugins/jquery.hoverIntent.js,plugins/jquery.jcarousel.js,plugins/jquery.tinysort.r3.js,plugins/jquery.vticker.1.3.1.js,plugins/jquery.pubsub.r5.js,ui/1.8.2/jquery.ui.core.js,ui/1.8.2/jquery.ui.widget.js,ui/1.8.2/jquery.ui.tabs.js,plugins/ba-debug-0.4.js,espn.l10n.r8.js,swfobject/2.2/swfobject.js,flashObjWrapper.r7.js,plugins/jquery.colorbox.1.3.14.js,plugins/jquery.ba-postmessage.js,espn.core.duo.r50.js,espn.mem.r15.js,stub.search.r3.js,espn.nav.mega.r24.js,espn.storage.r6.js,espn.p13n.r9.js,espn.video.r33a.js,registration/staticLogin.r10-14.js,espn.universal.overlay.r1.1.js,espn.insider.r5.js,espn.espn360.stub.r9.js,espn.myHeadlines.stub.r12.js,espn.myfaves.stub.r3.js,espn.scoreboard.r4.js,%2Fforesee_v3%2Fforesee-alive.js&debug=false"></script>
We could do something like:
script src="http://example.com/files/js/js_hash12342341234234?files=misc/jquery.js,misc/jquery.once.js,misc/drupal.js"></script>
The hash would be built like it is now and the combiner could know the files to use. This could be easily cached by the browser and we could test to make sure the files generate the right hash for security. I even like that it documents what files are included in the hash as it can be a pain to discover when you do need to know.