Learning R

Beginning at the beginning: R inferno - nice pdf, probably helps if you are already a programmer. -------------------------------------------------------------------------------------- Youtube! apply, effectively removes the need for a double for loop. tapply expects vectors. ----------------------- This is wrong!: mean(2, -100, -4, 3, -230, 5) This is correct!: mean(c(2, -100, -4, 3, -230, 5))

7:48 PM | Filed Under | 0 Comments

11/12/2012

If using SQLForge, need to have installed (but not necessarily loaded) human.db0 --------------------------------------------------------------------------------- plot(.., col=ifelse(foo>2, "red", "black") -------------------------------------------

1:35 PM | Filed Under | 0 Comments

The aim of the moderated t-test is to test whether the log-ratios are consistently greater or lesser than zero, not to test whether the log-ratios are equal. The moderated t-test does not penalize a gene for having a large variance as much as the ordinary t-test would do. Compared to the ordinary t-test, the moderated t-test gives more weight to the treatment fold change. It can therefore be viewed as a compromise between the ordinary t-test and ranking by fold-change.
--------------------------------------------------------------------------------------------------------------
The basic difference between linear models and glms is that with linear models, the F-test and all the individual t-tests that make it up can be conducted as one computation, but for glms each individual 1df test and overall F-test have to all be separate computations.
--------------------------------------------------------------------------------------------------------------
Delta beta refers to change in Beta value relative to a group.
----------------------------------------------------------------------------------------------------------------
Pearson correlations are a marginal measure of association and thus sensitive to confounding factors.
------------------------------------------------------------------------------------------------------
Agilent arrays - can't use robust-spline normalization, only global loess normalization is meaningful.

6:05 PM | Filed Under | 0 Comments

22/11/2012

Because I always forget:
2^number
###################
(HAO) AND "Homo sapiens"[porgn:__txid9606]

5:46 PM | Filed Under | 0 Comments

19/11/2012

scvitic
spartak 25
[Note the space!]
##############

8:08 PM | Filed Under | 0 Comments

Tuesday, 13th November 2012

Bpipe
hs_er_pid4.log - is a Java crash dump (a 'hotspot error log')
See if something is still running:
bpipe jobs
go to the directory of the pipeline and type
bpipe log
bpipe run -n 8
########
48 cores, 264G of memory
#####################
pathognomonic - characteristic or symptomatic of a disease or condition.
-------------------------------------------------------------------------
CNV arrays:
For any future work, ideally batches of > 50.
Less than 10 is basically ineffective.
-----------------------------------
ClinVar aggregates information about sequence variation and its relationship to human health. Because the resource is still under active development, our preliminary website release is limited to our preview site.

8:21 PM | Filed Under | 0 Comments

Thursday, 8th November 2012

grep -E "7$" hasoverlaps.bed.complete - get all lines ending in 7.
###################################################
Making a coloured .bed?
http://cloford.com/resources/colours/namedcol.htm
#######################################

8:26 PM | Filed Under | 0 Comments

Wednesday, 7th November 2012

If I ever revisit quantitative PCR (!), good refererence: (PubMed)

##################################################

Perl - removing duplicates

#!/usr/bin/perl
while (<>) {
chomp;
($ID,$data) = (/^(\S+)\s+(\S+.+)/);
@list = split (/\s+/,$data);
sort @list;
%seen = ();
@unique_list = grep { ! $seen{$_}++ } @list ;
print "$ID\t" . join (" ",@unique_list) ."\n";
}

Input:
UGT1A3r 7364 54575 54575
UGT1A4r 54575 54490 54576
UGT1A8r 7363 54575 7363 54576
UGT1A9r 54490
UMPK 51727
UMPK2 51727

Output:
UGT1A3r 7364 54575
UGT1A4r 54575 54490 54576
UGT1A8r 7363 54575 54576
UGT1A9r 54490
UMPK 51727
UMPK2 51727

2:13 PM | Filed Under | 0 Comments

5th November 2012

The R function sweep
Working with a matrix, want to change each row or column. Whether you operate by row or column is defined by MARGIN, as for apply..
E.g. add 1 to the 1st column, 2 to the 2nd, etc, of a matrix:
sweep(mymatrix,1,c(1:4), "+")

#############################################

find /path/to/files/ -name "session_*" -delete

#############################################

Type:

python

from Bio import SeqIO

SeqIO.convert("4788-1_S1_L001_R1_001_SISPA4_1.fastq", "fastq", "4788-1_S1_L001_R1_001_SISPA4_1.fasta", "fasta")

SeqIO.convert("4788-1_S1_L001_R1_001_SISPA4_1.fasta", "fastq", "4788-1_S1_L001_R1_001_SISPA4_1.qual", "quality")

#################################################

tar -zcvf archive_name.tar.gz directory_to_compress

##################################################
Reading in big files
library(sqldf)
f <- file="file" mybigfile="mybigfile" p="p">system.time(bigdf <- dbname="tempfile()," f="f" file.format="list(header=F," from="from" row.names="F)))<br" select="select" sqldf="sqldf">
See here

4:30 PM | Filed Under | 0 Comments

awk one liner

AWK
splitting a file on a pattern and naming the output files:

awk '/ENDMDL/{i++}{print > "file.pdb."i}' file

1:08 PM | Filed Under | 0 Comments

better or dead!

"Any time you encounter a small annoyance in your daily computer life, think about how you could write a program to help solve that problem. Any time you find something interesting that you want to experiment with, do it. Play with new concepts and tools and languages, as much as possible. Get into the mentality that a day/week/month in which you have not learned something interesting is a failure. But most of all: write code. Every day, even if it's just a regular expression to search through your email history or something. Do something programming-ish as often as you can."

Hippocrates (c. 400BC):
"Ars longa, vita brevis, occasio praeceps, experimentum periculosum, iudicium difficile"
"Life is short, [the] craft long, opportunity fleeting, experiment treacherous, judgment difficult."

5:37 PM | Filed Under | 0 Comments

back to base

transform e.g. head(airquality) head(transform(airquality, Ozone = -Ozone)) # makes Ozone negative head(transform(airquality, new = -Ozone, Temp = (Temp-32)/1.8)) # creates a new column called 'new' with negative ozone values and changes the values of the Temp column # example of subset one <- subset(bnames, sex == "boy" & year == 2008) one$rank <- rank(-one$percent, ties.method = "first") # or one <- transform(one, rank = rank(-percent, ties.method = "first")) # the use of minus means the rank values go from 1,2,3.. instead of 1000,999,998... # Fine for one year at a time, to do all at once: # Split pieces <- split(bnames, list(bnames$sex, bnames$year)) # Apply results <- vector("list", length(pieces)) for(i in seq_along(pieces)) { piece <- pieces[[i]] piece <- transform(piece, rank = rank(-percent, ties.method = "first")) results[[i]] <- piece } # Combine result <- do.call("rbind", results) # Or equivalently bnames <- ddply(bnames, c("sex", "year"), transform, rank = rank(-percent, ties.method = "first"))

7:25 PM | Filed Under | 0 Comments

Another thing often forgotten sometimes useful

log2-fold-changes are contained in fit2$coefficients.

9:18 PM | Filed Under | 0 Comments

R history

So again, something I have been meaning to look into: R history. By default only 512 lines are saved. Often I like to tinker with something until I get worked out exactly what I want to do, so saving a perfect script from the word go is not an option. So: Sys.setenv(R_HISTSIZE = Inf) *should* save all of my history, which I can then export, edit and save as a .R file.

5:18 PM | Filed Under | 0 Comments

Google is amazing.

Have been meaning to look this up for ages: how to set the cran mirror permanently.

3:14 PM | Filed Under | 0 Comments

Graphics

This is general, not just R specific, and I always forget the reasons so posted here: jpeg and pngs are raster graphics that do not scale well without becoming pixellated, whereas pdfs are vector graphics that will scale well.

9:07 PM | Filed Under | 0 Comments

Learning R

Books to think about buying: http://www.springer.com/series/6991?detailsPage=titles

9:49 PM | Filed Under | 0 Comments

Coursera - Mathematical Biostatistics Bootcamp - HWK Quiz 1

P(A U B) is always equal to... (DeMorgan's laws)

5:26 PM | Filed Under | 0 Comments

Coursera

Fabulous, amazing, just started but has enormous promise.

10:55 PM | Filed Under | 0 Comments

Today is the first day...

So!, no 1, become a better programmer, best to continue with what I mainly use, i.e. R, so become a better programmer in R. Ways to do this? Program more often! I’ll start a list of documents in one of the side bars. As I get into work early I have the luxury of time to myself, so maybe about an hour every morning first thing? Key to R – functions! First up, “The Art of R Programming”.
TODO: snow package and parApply

7:54 PM | Filed Under | 0 Comments

Being 33SuperWoman

Pages

Blog Archive

Links

Learning R

11/12/2012

22/11/2012

19/11/2012

Tuesday, 13th November 2012

Thursday, 8th November 2012

Wednesday, 7th November 2012

5th November 2012

awk one liner

better or dead!

back to base

Another thing often forgotten sometimes useful

R history

Google is amazing.

Graphics

Learning R

Coursera - Mathematical Biostatistics Bootcamp - HWK Quiz 1

Coursera

Today is the first day...

Find It

Credits