Background
Mandatory deposition of raw microarray data files for public access, prior to study publication, provides significant opportunities to conduct new bioinformatics analyses within and across multiple datasets. Analysis of raw microarray data files (e.g. Affymetrix .cel files) can be time consuming, complex, and requires fundamental computational and bioinformatics skills. The development of analytical workflows to automate these tasks simplifies the processing of, improves the efficiency of, and serves to standardize multiple and sequential analyses. Once installed, workflows facilitate the tedious steps required to run rapid intra- and inter-dataset comparisons.
More background, please refer to the publication:
MAAMD: A Workflow to Standardize Meta-Analyses and Comparison of Affymetrix Microarray Data, BMC Bioinformatics, 2014
Q &A Forum of MAAMD:
https://groups.google.com/forum/#!forum/maamd
More background, please refer to the publication:
MAAMD: A Workflow to Standardize Meta-Analyses and Comparison of Affymetrix Microarray Data, BMC Bioinformatics, 2014
Q &A Forum of MAAMD:
https://groups.google.com/forum/#!forum/maamd
Download Workflows & Samples
MAAMD:
This is a one-step workflow. The workflow conducts analyses from downloading, quality control, meta-analyses to inter-dataset comparison. This workflow generates folders to store data and results automatically. This will avoid potential mistakes such as the mismatch of targeted data location.
This workflow is good for someone who is not that familiar with MAAMD.
The package includes a 'workflow' folder and a 'sample' folder. You can follow 'A Study Case based on MAAMD' (click to see the webpage) to learn how to use MAAMD.
Two version of MAAMD are included in this package, MAAMD-ALL and MAAMD-ALL-NoQC.
MAAMD-Individual:
In the main, these individual workflows perform the same functions as MAAMD. The discrete workflows allow you executing them one by one and hence are more flexible to handle. Users need prepare proper inputs for each workflow following the designed formats. For more details, please read the publication.
These individual workflow is good for someone who is familiar with MAAMD and expect more user controls.
MAAMD-Download.xml:
function: downloads and decompresses targeted data sets.
input: edited datasets. csv
output: downloaded data sets
MAAMD-AltAnalyze.xml or MAAMD-AltAnalyze-NoQC.xml:
function: analyzes microarray data sets listed in the input file with or without quality estimation.
input: datasets.csv and input files which describe the details for each dataset.
output: folders contain analyzed results for each dataset.
MAAMD-Comparison.xml:
function: compares the results of analyzed datasets.
input: datasets.csv and results analyzed by MAAMD-AltAnalyze or MAAMD-AltAnalyze-NoQC.
output: an excel which lists conserved genes between datasets.
Note:
1. MAAMD-Comparison.xml allows you to compare your datasets in different way independent to the analyzing workflows.
2. Resoure files such as homologene.txt which comes with MAAMD are required. Please copy these workflow files to where MAAMD is located.
Attention! The current version of R package GEOquery doesn't work to download raw data in the supplies. (tested on Mar 27,2018.)
This issue will result in a failure of running MAAMD. Recommend to download raw data manually and use MAAMD-Individual to skip the downloading step.
MAAMD-Local-Online:
This workflow can work not only for online GEO datasets, but also for your local Affymetrix data. The basic steps are similar to MAAMD-ALL. You just need edit your input file following the format for local datasets. Samples of input files are listed below.
Note:
Resoure files such as homologene.txt which comes with MAAMD are required. Please copy these workflow files to where MAAMD is located.
This is a one-step workflow. The workflow conducts analyses from downloading, quality control, meta-analyses to inter-dataset comparison. This workflow generates folders to store data and results automatically. This will avoid potential mistakes such as the mismatch of targeted data location.
This workflow is good for someone who is not that familiar with MAAMD.
The package includes a 'workflow' folder and a 'sample' folder. You can follow 'A Study Case based on MAAMD' (click to see the webpage) to learn how to use MAAMD.
Two version of MAAMD are included in this package, MAAMD-ALL and MAAMD-ALL-NoQC.
MAAMD-Individual:
In the main, these individual workflows perform the same functions as MAAMD. The discrete workflows allow you executing them one by one and hence are more flexible to handle. Users need prepare proper inputs for each workflow following the designed formats. For more details, please read the publication.
These individual workflow is good for someone who is familiar with MAAMD and expect more user controls.
MAAMD-Download.xml:
function: downloads and decompresses targeted data sets.
input: edited datasets. csv
output: downloaded data sets
MAAMD-AltAnalyze.xml or MAAMD-AltAnalyze-NoQC.xml:
function: analyzes microarray data sets listed in the input file with or without quality estimation.
input: datasets.csv and input files which describe the details for each dataset.
output: folders contain analyzed results for each dataset.
MAAMD-Comparison.xml:
function: compares the results of analyzed datasets.
input: datasets.csv and results analyzed by MAAMD-AltAnalyze or MAAMD-AltAnalyze-NoQC.
output: an excel which lists conserved genes between datasets.
Note:
1. MAAMD-Comparison.xml allows you to compare your datasets in different way independent to the analyzing workflows.
2. Resoure files such as homologene.txt which comes with MAAMD are required. Please copy these workflow files to where MAAMD is located.
Attention! The current version of R package GEOquery doesn't work to download raw data in the supplies. (tested on Mar 27,2018.)
This issue will result in a failure of running MAAMD. Recommend to download raw data manually and use MAAMD-Individual to skip the downloading step.
MAAMD-Local-Online:
This workflow can work not only for online GEO datasets, but also for your local Affymetrix data. The basic steps are similar to MAAMD-ALL. You just need edit your input file following the format for local datasets. Samples of input files are listed below.
Note:
Resoure files such as homologene.txt which comes with MAAMD are required. Please copy these workflow files to where MAAMD is located.
Workflows: |
|
|
|
Input Samples & Manual:
|
|
|
|
When you are under below situations, you need select NoQC workflows which remove the quality control module:
1. If you can't install Bioconductor packages successfully.
If you have an issue to install arrayQualityMetrics or affyQCReport completely, MAAMD with QC will throw out errors and stop analyses. NoQC version allows you running the workflow even if these bioconductor packages are not installed.
2. If a quality control is unnecessary.
When you are confident at the microarray quality, you can use NoQC version of MAAMD. This version skips the quality estimation step and hence will save you a lot of time.
3. If you want to analyze non-Affymetrix microarray.
MAAMD full version can process Affymetrix microarray. Other types of microarray may cause errors. But, NoQC version does not have this limitation. The NoQC version can process Affymetrix, Illumina or Agilent microarray. The file below 'arrayfileinfo.xls' lists the microarray that MAAMD NoQC version can analyze.
1. If you can't install Bioconductor packages successfully.
If you have an issue to install arrayQualityMetrics or affyQCReport completely, MAAMD with QC will throw out errors and stop analyses. NoQC version allows you running the workflow even if these bioconductor packages are not installed.
2. If a quality control is unnecessary.
When you are confident at the microarray quality, you can use NoQC version of MAAMD. This version skips the quality estimation step and hence will save you a lot of time.
3. If you want to analyze non-Affymetrix microarray.
MAAMD full version can process Affymetrix microarray. Other types of microarray may cause errors. But, NoQC version does not have this limitation. The NoQC version can process Affymetrix, Illumina or Agilent microarray. The file below 'arrayfileinfo.xls' lists the microarray that MAAMD NoQC version can analyze.
arrayfileinfo.xls | |
File Size: | 41 kb |
File Type: | xls |
Instructions of Installation & Configuration
Installation & Configuration of R
>> Right click "Computer" and select "Properties" >> Go to "Advanced system settings", click the sub-menu "Advanced" >> Select "Environment Variables…" >> Scroll down "System Variables" list and select the variable "Path" >> Add R.exe path to the end with the separator ";". Note: For 64-bit OS, the path should be like "C:\Program Files\R\R-3.0.0\bin\x64" For 32-bit OS, the path should be like "C:\Program Files\R\R-3.0.0\bin\i386". 3. Installation of R packages
|
Installation & Configuration of Kepler
"<maxWaitTime>-1</maxWaitTime>" This removes the time limitation of pop-up webpages, and can keep waiting until the user makes a decision. If you don't make this change, you need respond in 300 seconds. Note: Once you update your Kepler, the corresponding configuration.xml need be updated due to the setting of Kepler. The updated configuration.xml can find at $HOME/KeplerData/kepler.modules/common-2.4.X/resources/configurations/. Installation & Configuration of AltAnalyze
A. open command-line console, from "start" >> "run", type in "cmd". B. change the directory to the location where AltAnalyze is located using a command line like "cd C:\tools\AltAnalyze_v.2.0.8". C. Type "AltAnalyze.exe", you should see AltAnalyze’s GUI. Otherwise, please check the version of your AltAnalyze or contact author. D. Click "Begin Analysis" >> A prompt window will appear which indicates no species database found if this is the first time you’ve run AltAnalyze. >> click "Continue" and select the species which you want to analyze, then click "Continue". AltAnalyze will download corresponding resources automatically. >> Click "Quit" after the downloading is complete. Attention! AltAnalyze usually downloads annotation files automatically. However, sometimes it fails due to either the shutdown of download server or the availibity of relevant annotation databases (not quite sure about the reason, it doesn't work for mouse microarray annotation files for me on Mar 27 2018). You'd better read the AltAnalyze log file to figure out whether everything is okay after you finish analysis. If there is an issue, it is better to skip the one with issue for further analysis. |
Workflow Procedure
1. Download MAAMD zip package, and unzip to C:, so you will have a folder "C:\MAAMD" which contains a "workflow" folder and a "sample" folder
2. Search GEO database http://www.ncbi.nlm.nih.gov/geo/. Look for data sets and collect data set information.
Note: Don't use those data sets without raw data available. MAAMD need raw data.
3. Edit input CSV files for the selected data sets with the fixed file format.
Refer to "sample-online datasets.csv" for the format of online datasets input
Refer to "sample-local datasets.csv" for the format of local datasets input
Refer to "sample-details of one dataset.csv" for the format of the samples in an individual data set
Note: A. please do not modify the names of columns.
B. the suffix ".CEL" is required for both "SampleName" and "NewName".
C. sometimes, the data supplier named samples different from what listed in the GEO sample list, in this case, you need keep the sample names in the sample file (e.g. datainfo-gse9400.csv) consistent with the exact sample names. Otherwise, MAAMD will encounter an error since MAAMD can't find corresponding sample files.
It is good to check the sample names after they are downloaded and uncompressed. Rerun MAAMD if the input files are modified.
4. Start Kepler and open MAAMD workflow in Kepler, keep Internet connection open when MAAMD is running.
5. Edit the parameters for MAAMD.
Nset: the number of datasets that you want to analyze.
Note: If Nset is smaller than what you listed in datasets.csv, then MAAMD will analzye the first “N” data sets only.
WorkPath: the folder where you want to store the data and results.
DataFile: the path of the csv file where you collect all datasets’ information.
Note: This csv file contains the summary of all targeted datasets. Refer to “datasets.csv” as an example.
MAAMDPath: the folder where you store MAAMD workflows.
Note: all paths have to use forward slash, namely "/", for path delimiter. "\" does not work in Kepler in Windows.
AltAnalyze: the directory of AltAnalyze location, for example, "C:/AltAnalyze_v.2.0.8-Win64".
6. Click "run" button.
2. Search GEO database http://www.ncbi.nlm.nih.gov/geo/. Look for data sets and collect data set information.
Note: Don't use those data sets without raw data available. MAAMD need raw data.
3. Edit input CSV files for the selected data sets with the fixed file format.
Refer to "sample-online datasets.csv" for the format of online datasets input
Refer to "sample-local datasets.csv" for the format of local datasets input
Refer to "sample-details of one dataset.csv" for the format of the samples in an individual data set
Note: A. please do not modify the names of columns.
B. the suffix ".CEL" is required for both "SampleName" and "NewName".
C. sometimes, the data supplier named samples different from what listed in the GEO sample list, in this case, you need keep the sample names in the sample file (e.g. datainfo-gse9400.csv) consistent with the exact sample names. Otherwise, MAAMD will encounter an error since MAAMD can't find corresponding sample files.
It is good to check the sample names after they are downloaded and uncompressed. Rerun MAAMD if the input files are modified.
4. Start Kepler and open MAAMD workflow in Kepler, keep Internet connection open when MAAMD is running.
5. Edit the parameters for MAAMD.
Nset: the number of datasets that you want to analyze.
Note: If Nset is smaller than what you listed in datasets.csv, then MAAMD will analzye the first “N” data sets only.
WorkPath: the folder where you want to store the data and results.
DataFile: the path of the csv file where you collect all datasets’ information.
Note: This csv file contains the summary of all targeted datasets. Refer to “datasets.csv” as an example.
MAAMDPath: the folder where you store MAAMD workflows.
Note: all paths have to use forward slash, namely "/", for path delimiter. "\" does not work in Kepler in Windows.
AltAnalyze: the directory of AltAnalyze location, for example, "C:/AltAnalyze_v.2.0.8-Win64".
6. Click "run" button.
Click here for |