Workflows

Background

Mandatory deposition of raw microarray data files for public access, prior to study publication, provides significant opportunities to conduct new bioinformatics analyses within and across multiple datasets. Analysis of raw microarray data files (e.g. Affymetrix .cel files) can be time consuming, complex, and requires fundamental computational and bioinformatics skills. The development of analytical workflows to automate these tasks simplifies the processing of, improves the efficiency of, and serves to standardize multiple and sequential analyses. Once installed, workflows facilitate the tedious steps required to run rapid intra- and inter-dataset comparisons.

More background, please refer to the publication:
MAAMD: A Workflow to Standardize Meta-Analyses and Comparison of Affymetrix Microarray Data, BMC Bioinformatics, 2014

Q &A Forum of MAAMD:
https://groups.google.com/forum/#!forum/maamd

Download Workflows & Samples

MAAMD:
This is a one-step workflow. The workflow conducts analyses from downloading, quality control, meta-analyses to inter-dataset comparison. This workflow generates folders to store data and results automatically. This will avoid potential mistakes such as the mismatch of targeted data location.
This workflow is good for someone who is not that familiar with MAAMD.
The package includes a 'workflow' folder and a 'sample' folder. You can follow 'A Study Case based on MAAMD' (click to see the webpage) to learn how to use MAAMD.
Two version of MAAMD are included in this package, MAAMD-ALL and MAAMD-ALL-NoQC.

MAAMD-Individual:
In the main, these individual workflows perform the same functions as MAAMD. The discrete workflows allow you executing them one by one and hence are more flexible to handle. Users need prepare proper inputs for each workflow following the designed formats. For more details, please read the publication.
These individual workflow is good for someone who is familiar with MAAMD and expect more user controls.
MAAMD-Download.xml:
function: downloads and decompresses targeted data sets.
input: edited datasets. csv
output: downloaded data sets
MAAMD-AltAnalyze.xml or MAAMD-AltAnalyze-NoQC.xml:
function: analyzes microarray data sets listed in the input file with or without quality estimation.
input: datasets.csv and input files which describe the details for each dataset.
output: folders contain analyzed results for each dataset.
MAAMD-Comparison.xml:
function: compares the results of analyzed datasets.
input: datasets.csv and results analyzed by MAAMD-AltAnalyze or MAAMD-AltAnalyze-NoQC.
output: an excel which lists conserved genes between datasets.
Note:
1. MAAMD-Comparison.xml allows you to compare your datasets in different way independent to the analyzing workflows.
2. Resoure files such as homologene.txt which comes with MAAMD are required. Please copy these workflow files to where MAAMD is located.
Attention! The current version of R package GEOquery doesn't work to download raw data in the supplies. (tested on Mar 27,2018.)
This issue will result in a failure of running MAAMD. Recommend to download raw data manually and use MAAMD-Individual to skip the downloading step.

MAAMD-Local-Online:
This workflow can work not only for online GEO datasets, but also for your local Affymetrix data. The basic steps are similar to MAAMD-ALL. You just need edit your input file following the format for local datasets. Samples of input files are listed below.
Note:
Resoure files such as homologene.txt which comes with MAAMD are required. Please copy these workflow files to where MAAMD is located.

Workflows:

MAAMD.zip
File Size:	4798 kb
File Type:	zip

Download File

MAAMD-Individual.zip
File Size:	58 kb
File Type:	zip

Download File

maamd-local-online.zip
File Size:	115 kb
File Type:	zip

Download File

Input Samples & Manual:

sample-online datasets.csv
File Size:	0 kb
File Type:	csv

Download File

sample-local datasets.csv
File Size:	0 kb
File Type:	csv

Download File

sample-details of one dataset.csv
File Size:	0 kb
File Type:	csv

Download File

maamd-v1.pdf
File Size:	2191 kb
File Type:	pdf

Download File

When you are under below situations, you need select NoQC workflows which remove the quality control module:
1. If you can't install Bioconductor packages successfully.
If you have an issue to install arrayQualityMetrics or affyQCReport completely, MAAMD with QC will throw out errors and stop analyses. NoQC version allows you running the workflow even if these bioconductor packages are not installed.
2. If a quality control is unnecessary.
When you are confident at the microarray quality, you can use NoQC version of MAAMD. This version skips the quality estimation step and hence will save you a lot of time.
3. If you want to analyze non-Affymetrix microarray.
MAAMD full version can process Affymetrix microarray. Other types of microarray may cause errors. But, NoQC version does not have this limitation. The NoQC version can process Affymetrix, Illumina or Agilent microarray. The file below 'arrayfileinfo.xls' lists the microarray that MAAMD NoQC version can analyze.

arrayfileinfo.xls
File Size:	41 kb
File Type:	xls

Download File

Instructions of Installation & Configuration

Installation & Configuration of R

Download and install R (R 3.0.0 or above) from http://cran.r-project.org/
Add the folder path where R.exe locates to the ‘system variables’ list.

For Windows
>> Right click "Computer" and select "Properties"
>> Go to "Advanced system settings", click the sub-menu "Advanced"
>> Select "Environment Variables…"
>> Scroll down "System Variables" list and select the variable "Path"
>> Add R.exe path to the end with the separator ";".
Note:
For 64-bit OS, the path should be like "C:\Program Files\R\R-3.0.0\bin\x64"
For 32-bit OS, the path should be like "C:\Program Files\R\R-3.0.0\bin\i386".

3. Installation of R packages
A. Ensure that you have the authority to update R libraries.
For windows,
>> go to the directory where R is installed
>> right-click the folder and select "properties"
>> under "security" tab, edit the permission and make sure you have "write" authority.

B. Open R console.
If both 32-bit and 64-bit are installed, open the one match
with the path you added into 'system variables'.
C. Input the following commands in R console to install required bioconductor packages:
source("http://www.bioconductor.org/biocLite.R")
biocLite()
biocLite("affyQCReport")
biocLite("GEOquery")
biocLite("arrayQualityMetrics")
D. Input the following commands to test whether the packages have been installed successfully.
library(affyQCReport)
library(GEOquery)
library(arrayQualityMetrics)
You need install these libraries properly before you run MAAMD workflow.
Attention! GEOquery doesn't work to download supplies in the recent version Mar 27,2018.

Installation & Configuration of Kepler

Go to http://www.oracle.com/technetwork/java/javase/downloads/index.html to download and install JDK 7.
Go to https://kepler-project.org/users/downloads to download and install Kepler (Kepler 2.4 or above)
To check whether Kepler is installed properly, start Kepler by double clicking its icon. You should see Kepler’s graphical user interface.
After installation, go to the directory where Kepler is installed, open file /common-2.4.0/resources/configurations/configuration.xml.

Modify the line "<maxWaitTime>300</maxWaitTime>" to
"<maxWaitTime>-1</maxWaitTime>"
This removes the time limitation of pop-up webpages, and can keep waiting until the user makes a decision. If you don't make this change, you need respond in 300 seconds.
Note: Once you update your Kepler, the corresponding configuration.xml need be updated due to the setting of Kepler. The updated configuration.xml can find at $HOME/KeplerData/kepler.modules/common-2.4.X/resources/configurations/.

Installation & Configuration of AltAnalyze

Go to http://code.google.com/p/altanalyze/downloads/list?can=1&q and download AltAnalyze 2.0.8.
Unzip it to your desired directory.
To make sure AltAnalyze works properly and has installed the species database.

for windows,
A. open command-line console, from "start" >> "run", type in "cmd".
B. change the directory to the location where AltAnalyze is located
using a command line like "cd C:\tools\AltAnalyze_v.2.0.8".
C. Type "AltAnalyze.exe", you should see AltAnalyze’s GUI.
Otherwise, please check the version of your AltAnalyze or contact author.
D. Click "Begin Analysis"
>> A prompt window will appear which indicates no species database found if this is the first time you’ve run AltAnalyze.
>> click "Continue" and select the species which you want to analyze,
then click "Continue". AltAnalyze will download corresponding resources automatically.
>> Click "Quit" after the downloading is complete.

Attention! AltAnalyze usually downloads annotation files automatically. However, sometimes it fails due to either the shutdown of download server or the availibity of relevant annotation databases (not quite sure about the reason, it doesn't work for mouse microarray annotation files for me on Mar 27 2018).
You'd better read the AltAnalyze log file to figure out whether everything is okay after you finish analysis. If there is an issue, it is better to skip the one with issue for further analysis.

Workflow Procedure

1. Download MAAMD zip package, and unzip to C:, so you will have a folder "C:\MAAMD" which contains a "workflow" folder and a "sample" folder
2. Search GEO database http://www.ncbi.nlm.nih.gov/geo/. Look for data sets and collect data set information.
Note: Don't use those data sets without raw data available. MAAMD need raw data.
3. Edit input CSV files for the selected data sets with the fixed file format.
Refer to "sample-online datasets.csv" for the format of online datasets input
Refer to "sample-local datasets.csv" for the format of local datasets input
Refer to "sample-details of one dataset.csv" for the format of the samples in an individual data set
Note: A. please do not modify the names of columns.
B. the suffix ".CEL" is required for both "SampleName" and "NewName".
C. sometimes, the data supplier named samples different from what listed in the GEO sample list, in this case, you need keep the sample names in the sample file (e.g. datainfo-gse9400.csv) consistent with the exact sample names. Otherwise, MAAMD will encounter an error since MAAMD can't find corresponding sample files.
It is good to check the sample names after they are downloaded and uncompressed. Rerun MAAMD if the input files are modified.
4. Start Kepler and open MAAMD workflow in Kepler, keep Internet connection open when MAAMD is running.
5. Edit the parameters for MAAMD.
  Nset: the number of datasets that you want to analyze.
Note: If Nset is smaller than what you listed in datasets.csv, then MAAMD will analzye the first “N” data sets only.
WorkPath: the folder where you want to store the data and results.
DataFile: the path of the csv file where you collect all datasets’ information.
Note: This csv file contains the summary of all targeted datasets. Refer to “datasets.csv” as an example.
MAAMDPath: the folder where you store MAAMD workflows.
Note: all paths have to use forward slash, namely "/", for path delimiter. "\" does not work in Kepler in Windows.
AltAnalyze: the directory of AltAnalyze location, for example, "C:/AltAnalyze_v.2.0.8-Win64".
6. Click "run" button.

Click here for

step by step guidance with user interface screen shots

Background

Download Workflows & Samples

﻿﻿﻿Workflows:﻿﻿﻿

﻿﻿﻿Input Samples & Manual:﻿﻿﻿

Instructions of Installation & Configuration

Workflow Procedure

Click here for

Back

Workflows:

Input Samples & Manual: