J!Extensions Store™
Forum
Welcome, Guest
Please Login to access forum.
Re:Error in robots.txt file after upgrading to v. 3.5 (1 viewing) 
Go to bottom
TOPIC: Re:Error in robots.txt file after upgrading to v. 3.5
#2988
Better Web
Fresh Boarder
Posts: 5
User Offline
Error in robots.txt file after upgrading to v. 3.5 2 Years, 11 Months ago Karma: 0  
Hi,

I notices that when you install the new version of JSitemap pro 3.5, the robots.txt gets edited.

Specifically these lines are added to the top:

User-Agent: Googlebot
Allow: /*.js*
Allow: /*.css*
Allow: /*.png*
Allow: /*.jpg*
Allow: /*.gif*

I understand the intention, but unfortunately this gives access to EVERYTHING on the server.
If you test, for instance "administrator/index.php" in the robots.txt test tool on the Google Search Console, you will see it's not blocked.

I guess it comes from the way robots.txt files work: you have to be specific about what you allow AND disallow.

So this works better:

User-Agent: Googlebot
Allow: /*.js*
Allow: /*.css*
Allow: /*.png*
Allow: /*.jpg*
Allow: /*.gif*
Disallow: /administrator/
Disallow: /cli/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /log/
Disallow: /logs/
Disallow: /tmp/

Googlebot will have access to ALL js, css, ... files on the server, EXCEPT those in the listed directories with the "Disallow" statement.
 
Logged Logged  
  The administrator has disabled public write access.
#2989
John Dagelmore
Admin
Posts: 3433
User Offline
Re:Error in robots.txt file after upgrading to v. 3.5 2 Years, 11 Months ago Karma: 75  
Hi and thanks a lot for the help reporting this aspect.

However if you use the follow:

User-Agent: Googlebot
Allow: /*.js*
Allow: /*.css*
Allow: /*.png*
Allow: /*.jpg*
Allow: /*.gif*
Disallow: /administrator/
Disallow: /cli/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /log/
Disallow: /logs/
Disallow: /tmp/

The Disallow will always win, if someone has a
Disallow: /media/
the js/css will be still blocked.

Thanks,
John
 
Logged Logged  
  The administrator has disabled public write access.
#2990
Better Web
Fresh Boarder
Posts: 5
User Offline
Re:Error in robots.txt file after upgrading to v. 3.5 2 Years, 11 Months ago Karma: 0  
Hi John.

I agree, but as a general security rule, you should only give access to what's necessary.
So one should not include "Disallow: /media/" but there is no need to give access to the other directories I listed.
 
Logged Logged  
  The administrator has disabled public write access.
#2991
John Dagelmore
Admin
Posts: 3433
User Offline
Re:Error in robots.txt file after upgrading to v. 3.5 2 Years, 11 Months ago Karma: 75  
I agree with you, indeed we modified the robots.txt edit.
The problem is mainly due to the rule 'Disallow: Googlebot', even if the access is given only to '.js' and '.css' resources the generic block is skipped by Google.
It's better to avoid it and simply remove the "Disallow: /media/". This is our conclusion after several tests.

Everywhere seems that the correct and easy way is adding the lines:

User-Agent: Googlebot
Allow: /*.js*
Allow: /*.css*
http://upcity.com/blog/how-to-fix-googlebot-cannot-access-css-and-js-files-error-in-google-search-console/

But as you pointed out this solution seems not fully correct.

Thanks for your help.
 
Logged Logged  
 
Last Edit: 2015/08/14 16:12 By @store$eco#mm!.
  The administrator has disabled public write access.
#2992
Better Web
Fresh Boarder
Posts: 5
User Offline
Re:Error in robots.txt file after upgrading to v. 3.5 2 Years, 11 Months ago Karma: 0  
The latest robots.txt included in Joomla goes in that direction :

User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/

The /media/ folder is removed but you can still find js/css files (not to mention images) in /cache/, /plugins/, /modules/ or /components/.

A quick and dirty way to solve this is to remove these directories.
Then you can add your lines to restrict access to js/css/images files. BUT, you still need to specify some Disallow, unless you don't care about restricting anything. But remember, what googlebot can access can be indexed, and you may not always want that.

Before I ran into your solution, I used to be very specific, so I have robots.txt looking like this:

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/

User-agent: googlebot
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/
Allow: /plugins/system/jch_optimize/assets2/
Allow: /components/com_fabrik/views/tmpl/

I use the list of blocked ressources from the Google Search Console to customize the file. it's more work but gives you the most control.
 
Logged Logged  
  The administrator has disabled public write access.
Go to top