http://www.percona.com/blog/2015/03/17/mysql-qa-linux-upskill-bash-gnu-tools-scripting-fun/?mkt_tok=3RkMMJWWfF9wsRojuqnBZKXonjHpfsX%2F6O0oX6K3lMI%2F0ER3fOvrPUfGjI4CSMBjI%2BSLDwEYGJlv6SgFQrLNMadt3rgNWxI%3D
gnuwin32.sourceforge.net/
Just a collection of some random cool stuff. PS. Almost 99% of the contents here are not mine and I don't take credit for them, I reference and copy part of the interesting sections.
Wednesday, April 29, 2015
Thursday, March 12, 2015
Points of Significance in Biology - Nature column
http://www.nature.com/nmeth/journal/v10/n9/full/nmeth.2613.html
http://www.nature.com/collections/qghhqm
http://www.nature.com/collections/qghhqm
Since September 2013 Nature Methods has been publishing a monthly column on statistics aimed at providing reseachers in biology with a basic introduction to core statistical concepts and methods, including experimental design. Although targeted at biologists, the articles are useful guides for researchers in other disciplines as well. A continuously updated list of these articles is provided below.
Importance of being uncertain - How samples are used to estimate population statistics and what this means in terms of uncertainty.
Error Bars - The use of error bars to represent uncertainty and advice on how to interpret them.
Significance, P values and t-tests - Introduction to the concept of statistical significance and the one-sample t-test.
Power and sample size - Using statistical power to optimize study design and sample numbers.
Visualizing samples with box plots - Introduction to box plots and their use to illustrate the spread and differences of samples. See also: Kick the bar chart habit and BoxPlotR: a web tool for generation of box plots
Comparing samples—part I - How to use the two-sample t-test to compare either uncorrelated or correlated samples.
Comparing samples—part II - Adjustment and reinterpretation of P values when large numbers of tests are performed.
Nonparametric tests - Use of nonparametric tests to robustly compare skewed or ranked data.
Designing comparative experiments - The first of a series of columns that tackle experimental design shows how a paired design achieves sensitivity and specificity requirements despite biological and technical variability.
Analysis of variance and blocking - Introduction to ANOVA and the importance of blocking in good experimental design to mitigate experimental error and the impact of factors not under study.
Replication - Technical replication reveals technical variation while biological replication is required for biological inference.
Nested designs - Use the relative noise contribution of each layer in nested experimental designs to optimally allocate experimental resources using ANOVA.
Two-factor designs - It is common in biological systems for multiple experimental factors to produce interacting effects on a system. A study design that allows these interactions can increase sensitivity.
Sources of variation - To generalize experimental conclusions to a population, it is critical to sample its variation while using experimental control, randomization, blocking and replication to collect replicable and meaningful results.
Matrix factorization - R-bloggers
http://www.r-bloggers.com/testing-recommender-systems-in-r/
http://www.r-bloggers.com/matrix-factorization/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29
Latent Factor Matrix Factorization.
Singular value decomposition
optim() function in R for optimization
http://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/
In-depth introduction to machine learning in 15 hours of expert videos
http://www.r-bloggers.com/matrix-factorization/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29
Latent Factor Matrix Factorization.
Singular value decomposition
optim() function in R for optimization
http://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/
In-depth introduction to machine learning in 15 hours of expert videos
Wednesday, March 11, 2015
The Elements of Programming Style
http://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
- Write clearly -- don't be too clever.
- Say what you mean, simply and directly.
- Use library functions whenever feasible.
- Avoid too many temporary variables.
- Write clearly -- don't sacrifice clarity for efficiency.
- Let the machine do the dirty work.
- Replace repetitive expressions by calls to common functions.
- Parenthesize to avoid ambiguity.
- Choose variable names that won't be confused.
- Avoid unnecessary branches.
- If a logical expression is hard to understand, try transforming it.
- Choose a data representation that makes the program simple.
- Write first in easy-to-understand pseudo language; then translate into whatever language you have to use.
- Modularize. Use procedures and functions.
- Avoid gotos completely if you can keep the program readable.
- Don't patch bad code -- rewrite it.
- Write and test a big program in small pieces.
- Use recursive procedures for recursively-defined data structures.
- Test input for plausibility and validity.
- Make sure input doesn't violate the limits of the program.
- Terminate input by end-of-file marker, not by count.
- Identify bad input; recover if possible.
- Make input easy to prepare and output self-explanatory.
- Use uniform input formats.
- Make input easy to proofread.
- Use self-identifying input. Allow defaults. Echo both on output.
- Make sure all variables are initialized before use.
- Don't stop at one bug.
- Use debugging compilers.
- Watch out for off-by-one errors.
- Take care to branch the right way on equality.
- Be careful if a loop exits to the same place from the middle and the bottom.
- Make sure your code does "nothing" gracefully.
- Test programs at their boundary values.
- Check some answers by hand.
- 10.0 times 0.1 is hardly ever 1.0.
- 7/8 is zero while 7.0/8.0 is not zero.
- Don't compare floating point numbers solely for equality.
- Make it right before you make it faster.
- Make it fail-safe before you make it faster.
- Make it clear before you make it faster.
- Don't sacrifice clarity for small gains in efficiency.
- Let your compiler do the simple optimizations.
- Don't strain to re-use code; reorganize instead.
- Make sure special cases are truly special.
- Keep it simple to make it faster.
- Don't diddle code to make it faster -- find a better algorithm.
- Instrument your programs. Measure before making efficiency changes.
- Make sure comments and code agree.
- Don't just echo the code with comments -- make every comment count.
- Don't comment bad code -- rewrite it.
- Use variable names that mean something.
- Use statement labels that mean something.
- Format a program to help the reader understand it.
- Document your data layouts.
- Don't over-comment
The Elements of Style
Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that he avoid all detail and treat his subjects only in outline, but that he make every word tell.
—"Elementary Principles of Composition", The Elements of Style[10]
Research methods: Know when your numbers are significant
Term | Meaning | Common uses |
---|---|---|
N, number of independent samples; t, the t-statistic; p, probability.
| ||
Standard deviation (s.d.) | The typical difference between each value and the mean value. | Describing how broadly the sample values are distributed. s.d. = √−(∑ (x − mean)2/(N − 1)) |
Standard error of the mean (s.e.m.) | An estimate of how variable the means will be if the experiment is repeated multiple times. | Inferring where the population mean is likely to lie, or whether sets of samples are likely to come from the same population. s.e.m. = s.d./√−N |
Confidence interval (CI; 95%) | With 95% confidence, the population mean will lie in this interval. | To infer where the population mean lies, and to compare two populations. CI = mean ± s.e.m. × t(N−1) |
Independent data | Values from separate experiments of the same type that are not linked. | Testing hypotheses about the population. |
Replicate data | Values from experiments where everything is linked as much as possible. | Serves as an internal check on performance of an experiment. |
Sampling error | Variation caused by sampling part of a population rather than measuring the whole population. | Can reveal bias in the data (if it is too small) or problems with conduct of the experiment (if it is too big). In binomial distributions (such as live and dead cell counts) the expected s.d. is √−(N × p × (1 − p)); in Poisson dist |
Monday, March 9, 2015
Points of significance: Power and sample size
Points of significance: Power and sample size
Martin Krzywinski
& Naomi Altman
Nature Methods
10,
1139–1140
(2013)
doi:10.1038/nmeth.2738
http://www.nature.com/nmeth/journal/v10/n12/full/nmeth.2738.html
Figure 3: Decreasing specificity increases power.
Martin Krzywinski
& Naomi Altman
Nature Methods
10,
1139–1140
(2013)
doi:10.1038/nmeth.2738
http://www.nature.com/nmeth/journal/v10/n12/full/nmeth.2738.html
Figure 3: Decreasing specificity increases power.
(a) Observations are assumed to be from the null distribution (H0) with mean μ0. We reject H0 for values larger than x* with an error rate α (red area). (b) The alternative hypothesis (HA) is the competing scenario with a different mean μA. Values sampled from HA smaller than x* do not trigger rejection of H0 and occur at a rate β. Power (sensitivity) is 1 − β (blue area). (c) Relationship of inference errors to x*. The color key is same as in Figure 1.
Figure 4: Impact of sample (n) and effect size (d) on power.
H0 and HA are assumed normal with σ = 1. (a) Increasing n decreases the spread of the distribution of sample averages in proportion to 1/√n. Shown are scenarios at n = 1, 3 and 7 for d = 1 and α = 0.05. Right, power as function of n at four different α values for d = 1. The circles correspond to the three scenarios. (b) Power increases with d, making it easier to detect larger effects. The distributions show effect sizes d = 1, 1.5 and 2 for n = 3 and α = 0.05. Right, power as function of d at four different a values for n = 3.
Figure 4: Impact of sample (n) and effect size (d) on power.
H0 and HA are assumed normal with σ = 1. (a) Increasing n decreases the spread of the distribution of sample averages in proportion to 1/√n. Shown are scenarios at n = 1, 3 and 7 for d = 1 and α = 0.05. Right, power as function of n at four different α values for d = 1. The circles correspond to the three scenarios. (b) Power increases with d, making it easier to detect larger effects. The distributions show effect sizes d = 1, 1.5 and 2 for n = 3 and α = 0.05. Right, power as function of d at four different a values for n = 3.
Monday, January 12, 2015
Anaconda / Miniconda - Python package managers
http://docs.continuum.io/anaconda/index.html
http://conda.pydata.org/miniconda.html
Anaconda is a free collection of powerful packages for Python that enables large-scale data management, analysis, and visualization for Business Intelligence, Scientific Analysis, Engineering, Machine Learning, and more.
http://conda.pydata.org/miniconda.html
Anaconda is a free collection of powerful packages for Python that enables large-scale data management, analysis, and visualization for Business Intelligence, Scientific Analysis, Engineering, Machine Learning, and more.
Subscribe to:
Posts (Atom)