Proposal overview (brief version)

This bot will create or amend up to ~10,000 pages corresponding to mammalian genes. Pages will be created in groups of ~100 to ensure page quality. Each new page will be seeded with content from databases in the public domain. This content will include information about the gene's symbol, description, function, genomic location, structure and identifiers. Genes which do not have any existing WP pages for its symbol, aliases, or title will be created (e.g., MMP9). Genes which do have these conflicts in the WP namespace will be flagged for manual integration (e.g., Apolipoprotein_E). More details are presented in User:ProteinBoxBot/Ideas. This bot is currently being designed and developed by AndrewGNF and JonSDSUGrad.

Trial Run

A trial run for this bot was approved. The trial was completed and the log is here: User:ProteinBoxBot/PBB_Log_Wiki_Live_Run.

After making quite a few adjustments, a second trial run was completed and the log file is here: User:ProteinBoxBot/PBB_Log_Wiki_Live_Run3_Char_Fix

The bot was subsequently approved and granted bot status.

The eight pages created by the ProteinBoxBot in the trial are:


In addition, these pre-existing pages were supplemented with ProteinBoxBot content in a semi-automated edit:

APP APOE Androgen receptor BRCA1
Bcl-2 P21 P16 Beta-catenin
Epidermal growth factor receptor HER2/neu Estrogen receptor HLA-B
Insulin-like growth factor 1 Interleukin 10 IL1B Interleukin 6
Interleukin 8 CD29 PKC alpha Retinoblastoma protein
Src (gene) Tumor necrosis factor-alpha P53 (protein) Vascular endothelial growth factor
Caspase 3

The discussion of the ProteinBoxBot's trial run is archived at Wikipedia:Bots/Requests_for_approval/ProteinBoxBot.

Logic Flow

The following Flow charts describe the logic of Protein Box Bot:


Protein Box bot does extensive logging of its activities.

Protein Box Bot does not always know the exact name of a protein page. This page has been created to help with that.

Protein Box Bot Quick Manual

When dealing with wikipages it is often difficult to automatically determine how and what to update - especially for a bot. Therefore a group of BOT tags were created to ensure that Protein Box Bot behaves appropriately and will not overwrite any information without permission. BOT Tags are always located within HTML comment delimiters. They are always the first text in the comment, but may be followed by any additional text. General format is:

 <!-- BOT TAG < = VALUE> <COMMENT> --> 

TAG: Manual Inspection (Required)


Setting this tag to YES requires the Bot to Skip updating the protein and instead log the updated page code. An operator will then be able to review the updated code in the Log file and then manually update the protein page. This TAG is required for Bot operation, and without it the Bot will Skip the page.

TAG: Update Protein Box (Required)


This tag controls whether or not the bot will update the protein box with new information. A value of YES indicates that the bot should try and overwrite the protein box with new information. A value of NO causes the bot to skip the update of the protein box. This TAG is required for bot operation and without it the bot will skip the page.
During an update, all information in the protein box is overwritten (even with blank values) with the exception of 'image' and 'image_source', which are carried over into the new box. Only if those fields are blank will the Bot try and locate an image. Default image file names follow this format:

PBB_Protein_<protein symbol>_image.jpg

Where <protein symbol> is the actual symbol for the protein (such as PBB_Protein_AKT1_image.jpg).

TAG: Update Summary (Required)


This tag controls the START location of the protein summary update. It also indicates whether or not the existing summary should be updated (with a NO indicating that an update should NOT occur). If an update is desired, all text immediately following this tag will be overwritten (until the Summary End tag is reached, see below) with the updated summary information. This TAG is required for bot operation; the page will be skipped without it.

TAG: End Summary (Required)


This tag denotes the END location of the current summary. During an update all information between the Summary begin Tag and this tag will be overwritten with the updated summary information. This TAG is required for bot operation; the page will be skipped without it.

TAG: No Bots (Optional)

<!-- NO BOT EDITS -->



This tag will cause the bot to skip updating this page. As the presence of this tag will abort the operation of the bot, its use is optional and not required for bot operation.