Statistics Tutorials: The Use of Notation in Basic Statistics - Part I
One thing that gets students confused very frequently, and I would say more than necessary, is the liberal use of mathematical notation that occurs in Statistics, even at basic levels. More often than it would be desired, instructors use notation that students are unsure about. Rightfully so, teachers see in the use of notation a way of expressing ideas in a precise, unequivocal, more compact way. And as ideas build up, the use of notation can become more convoluted, or convoluted enough to leave students confused and biting the dust.
In the following paragraphs we will attempt to clarify the use of notation in Statistics from the bottom up, from notations in the most basic descriptive statistics, to the notation used in more sophisticated hypothesis tests.
Notation in Descriptive Statistics
The following symbols are commonly used when working with descriptive statistics. These symbols are still used throughout most of your Statistics class.
\(\bar{X}\): This is the sample mean, which corresponds to the arithmetic average of the value from a sample \({{X}_{1}}\), \({{X}_{2}}\),...,\({{X}_{n}}\). This is statistic (because it is constructed with sample information). In some courses, especially in the Social and Behavioral Sciences, they use \(M\) to refer to the sample mean.
\({s}^{2}\): This is the sample variance, which is computed as
\[{{s}^{2}}=\frac{1}{n-1}\left( \sum\limits_{i=1}^{n}{X_{i}^{2}}-\frac{1}{n}{{\left( \sum\limits_{i=1}^{n}{{{X}_{i}}} \right)}^{2}} \right)\]
This is statistic (because it is constructed with sample information). There are other versions of the above formula, but they all lead to the same numerical value.
\(s\): This is the sample standard deviation, which is computed by taking the square root of the sample variance, or simply by using the above formula, which is computed from the sample data \({X}_{1}\), \({{X}_{2}}\),...,\({X}_{n}\)
\[s=\sqrt{\frac{1}{n-1}\left( \sum\limits_{i=1}^{n}{X_{i}^{2}}-\frac{1}{n}{{\left( \sum\limits_{i=1}^{n}{{{X}_{i}}} \right)}^{2}} \right)}\]
This is statistic (because it is constructed with sample information). There are other versions of the above formula, but they all lead to the same numerical value.
\(SS\): This is the "sum of squares". This statistics measures the squared variation of a variable \(X\) with respect to the sample mean. If you have a sample \({{X}_{1}}\), \({X}_{2}\),...,\({{X}_{n}}\), the formula used to compute it is
\[SS=\sum\limits_{i=1}^{n}{{{\left( {{X}_{i}}-\bar{X} \right)}^{2}}}\]Often times, a subscript is used to indicate what variable we refer to, if not clear. For example, you can write \(S{{S}_{X}}\) to refer to the sum of squares of variable \(X\), or you can write \(S{{S}_{Y}}\) to refer to the sum of squares of variable Y. In Social and Behavioral Sciences, you will typically write the sum of squares of \(X\) as \(SS_{XX}\) instead of \(SS_{X}\) but it is all simply about what is the preferred notation that makes more sense. There are other expressions that are equivalent when it comes about expressing the sum of squares. For example, here we have two alternative ways to write the sum of squares:
\[S{{S}_{XX}}=\sum\limits_{i=1}^{n}{{{\left( {{X}_{i}}-\bar{X} \right)}^{2}}}=\sum\limits_{i=1}^{n}{X_{i}^{2}}-\frac{1}{n}{{\left( \sum\limits_{i=1}^{n}{{{X}_{i}}} \right)}^{2}}\]
Based on the above, there is a clear link between the sample variance and the sum of squares:
\[{{s}^{2}}=\frac{S{{S}_{XX}}}{n-1}\]
Notice that notation sometimes is excessive, and sometimes is inconsistent. Indeed, it is very common to use a subscript for the sum of squares (like in \(S{{S}_{XX}}\)) to indicate which variable we are referring to (\(X\) in this case). Although, in the case of the variance or standard deviation such use of subscripts is less common, although still acceptable. For example, you can write \({{s}_{X}}\) to specify the sample standard deviation of variable \(X\), or more precisely said, \({{s}_{X}}\) indicates the sample standard deviation computed off the sample \({{X}_{1}}\), \({{X}_{2}}\),...,\({{X}_{n}}\) that comes from the random variable \(X\).
\(m\): Sample median. The point (or interpolated point) that sets the middle of the distribution. There is not a universal agreement about referring the sample median as \(m\), but it is a common practice.
\({{Q}_{j}}\): This is the jth quartile, with \(j=1,2,3,4\). These are the points (or interpolated points) that divide the distribution in quarters. Notice that \({{Q}_{2}}\) is the median.
\({{P}_{x}}\): This is the x-th percentile, with \(0\le x\le 100\). These are the points (or interpolated points) so that x percent of the distribution is to the left of those points. Observe that \(m={{Q}_{2}}={{P}_{50}}\).
IQR: This is the interquartile range, and it is defined as \(IQR={{Q}_{3}}-{{Q}_{1}}\), which is the difference between the third and first quartiles. This is commonly used as a measure of dispersion and to detect outliers.
Other descriptive statistics: There are many less commonly used descriptive statistics for which there are no universal symbols to use. For example, skewness, kurtorsis, moments of higher order, etc, are sometimes used, but not compact symbols are universally used to denote them.
|
Submit your problems for a free quote and we will be back shortly (a couple of hours max). It costs you NOTHING to find out how much it would cost to solve your problems.
We provide a quality problem solving service on the following stats topics:
- Probability
- Basic Concepts: Sample Space, Events.
- Densities and Distributions.
- Descriptive statistics.
- Descriptive Analysis of data.
- Graphs and charts.
- Inferential Statistics
- Means, variances, populations, samples.
- Intervals of Confidence.
- Z-test, T-test and F-tests.
- Hypothesis Testing.
- ANOVA.
- Correlation.
- Linear and non-linear regression.
- Non-parametric Statistics.
- Sign Test.
- Wilkinson Tests.
- Kruskal-Wallis Test.
- Spearman Correlation Coefficient.
Our team is highly experienced in SPSS, Minitab, EXCEL and the majority of the statistical software packages out there. Request your free quote. We a have a satisfaction guarantee policy. If you're not satisfied, we'll refund you. Please see our terms of service for more information about the satisfaction guaranteed policy. See also a sample of our work.
Why we can help with your Stats?
Experience
We have successfully help customers online for more than 10 years now
Statistics Expertise
We can do handle any type of statistical analysis/homework/questions. Our tutors have real expertise, and big majority of our customers are returning customers
Step-by-Step Solutions
We provide detailed, step-by-step solutions, and we strive to provide exactly what our customers want.
Free Quote
E-mail us your problems, we will review them and promptly come back to you with a free quote
Very Competitive Prices
We strive to provide the best possible prices for our services
We take pride of our work
Our tutors take pride on the work we do. We diligently do work for our customers, and put great attention to details striving to always provide a great final product