Statistics: an assumption of the linear regression model

Tutoring statistics, linear regression is perennial. The tutor mentions an assumption it includes.

When appropriate, linear regression models data by the equation

y = a + bx + e,

e being an error term due to variability.

An inherent assumption of linear regression modelling is that the error term, e, does not depend on the actual data value, x.

In many lab environments, the assumption that error magnitude does not depend on the measurement’s magnitude makes sense. For instance, measuring with a ruler, the error is often set to ± 0.5mm, regardless of the length measured.

For some types of data, however, the measurement’s magnitude seems to impact its error magnitude. An example might be inventory counting. One imagines that, counting only three items, the error would likely be 0. Counting a thousand, however, would more likely yield an observation a few off from the real number present, and so on.

Perhaps the point is that the data has to be measured or observed, which itself brings error, perhaps dependent on the size of the measurement itself.

Source:

Harnett, Donald L. and James L. Murphy. Statistical Analysis for Business and Economics. Don Mills: Addison-Wesley, 1986.

Jack of Oracle Tutoring by Jack and Diane, Campbell River, BC.

Calculator usage: finding final price after discount and tax with the HP-10B

Tutoring financial math, you might often use the HP-10B. The tutor shows how easily it can apply a discount then add tax to get the final price.

Example: Imagine a handbag is regular price $85 but is discounted by 20%. Assuming 12% sales tax, find final price using the HP-10B.

Solution:

  1. Key in 85
  2. Key in – 20 % =
  3. Key in + 12 % =

HTH:)

Source:

Hewlett-Packard HP-10B Business Calculator Owner’s Manual. Corvallis: Hewlett-Packard, 1994.

Jack of Oracle Tutoring by Jack and Diane, Campbell River, BC.

Calculator usage: memory on the HP-10B

Tutoring financial math, you’ll likely encounter the capable HP-10B. The tutor tells how to use its user-accessible memory.

The HP-10B seems to have 11 dedicated places to store your own numbers. The locations are at 0 to 9, plus there is the M register.

Example: On the HP-10B, store the number 65.21 in register 5.

  1. Key in 65.21
  2. Press the orange key
  3. Press RCL
  4. Press 5

To retrieve the number,

  1. Press RCL
  2. Press 5

HTH:)

Source:

Hewlett-Packard Business Calculator HP-10B Owners Manual. Corvallis: Hewlett-Packard, 1994.

Jack of Oracle Tutoring by Jack and Diane, Campbell River, BC.

Microsoft Word 2007: changing date format

Tutoring Business English, word processing formats inevitably arise. The tutor suggests how to change the date format in Microsoft Word 2007.

I prefer the date format April 14, 2017. There’s no chance of misunderstanding it, since the month is a proper noun and the year is four digits.

On my computer, anyway, Microsoft Office Word 2007 prefers the format 2017-04-14. What can a user do to change the date format?

  1. Click Insert at the top (next to Home).
  2. Locate, in the Text area of the toolbar (probably right side), the option Date and Time. Click it.
  3. You will be offered a menu of date formats.
  4. On mine, among the English (Canada) choices, the closest to April 14, 2017 is 14 April 2017.

Of course, the user can type any date format they want. However, Word will try to autocorrect it to the chosen (or else default) format. To stop the autocorrect, simply left-click when it’s offered. (Make sure the mouse pointer is on the cursor when you do so, or else your cursor might be sent wherever the mouse pointer is.)

Source:

superuser.com

Jack of Oracle Tutoring by Jack and Diane, Campbell River, BC.

Statistics, spreadsheets: confidence interval for population mean: CONFIDENCE() function on Excel and LibreOffice Calc

Tutoring statistics, you realize how convenient using a spreadsheet can be.

In yesterday’s post I mentioned some theoretical points about two-sided confidence intervals for the population mean.

On the practical side, if you simply need a confidence interval for the population mean, you can use Excel’s CONFIDENCE() function, which works the same on LibreOffice Calc. It has the following format:

=confidence(1-confidence_level, pop_standard_deviation, sample_size)

The formula assumes the population standard deviation is known. If not, you can just use a sample_size ≥31, calculate the sample standard deviation, and use it. This gives a pretty good approximation (see yesterday’s post).

The CONFIDENCE() formula gives the margin of error for the confidence interval. To get the actual lower and upper bounds, you both subtract and add its output to the sample mean.

Example:

Imagine an exam written by 706 students. A sample of 42 papers reveals a mean grade of 67.3 and standard deviation 12.4. Give a 95% confidence interval for the mean exam mark.

Solution:

The confidence level is 95% = 0.95, so the first parameter is 1-0.95=0.05.

In a cell, key

=confidence(0.05, 12.4, 42)

Hopefully, you obtain the output 3.75, which means the confidence interval for the mean is given by

67.3±3.75

or

63.55 to 71.05

Apparently the mean, with 95% confidence, is between 63.55 and 71.05.

HTH:)

Jack of Oracle Tutoring by Jack and Diane, Campbell River, BC.

Statistics: confidence interval for the mean (two sided)

Tutoring statistics, confidence intervals are important.

A two-sided confidence interval for the population mean is given by

sample_mean – (standard_dev/n1/2)*sig_factor, sample_mean + (standard_dev/n1/2)*sig_factor

The sig_factor (significance factor) depends on the certainty (confidence level) with which we want the confidence interval to include the population mean; typically it’s around 2 (aka, 1.96) for 95% confidence.

The standard deviation might be known or might be calculated from the sample itself. If it’s known, the normal distribution is used; if calculated, then technically the t-distribution should be used (see point 3 below).

There are a few points that make the two-sided confidence interval for the population mean an elegant construct:

  1. Its lower and upper boundaries depend on the sample size, but not the population size.
  2. For sample size n≥31, the parent population needn’t be normal for the sample mean to be normally distriubted. This validates the confidence interval even for a non-normal population for n≥31. It’s a consequence of the Central Limit Theorem. (Actually, the rule of thumb is n≥30, but for the purpose of the next point, I like 31.)
  3. For n≥31, the t-distribution approximates the normal to around 4%, so the normal approximation can probably be used even for unknown population standard deviation.

Source:

Harnett, Donald L. and James L. Murphy. Statistical Analysis for Business and Economics, first Can. ed. Don Mills: Addison-Wesley, 1993.

Jack of Oracle Tutoring by Jack and Diane, Campbell River, BC.

Spreadsheets: Excel or LibreOffice Calc: sumif(), continued

Tutoring spreadsheets, here’s the next level of complexity with the sumif() function: a separate test range.

In my March 9 post I brought up sumif(), pointing out that it will selectively add values within a given range.

sumif() offers one more level of flexibility: you can use a different test range from the sum range.

Example: Consider the following partial spreadsheet:

A B
110 green
40 red
50 red
90 green
70 green

Let’s imagine that, above, 110 is in the A1 position.

The sumif() formula can be used with the following format:

=sumif(decision_range,condition,range_to_add)

Note that, in this case, the first parameter is the range to check for agreement with the condition, while the range of numbers to potentially add is the final parameter.

Continuing with the example, let’s imagine that the formula

=sumif(b1:b5,”red”,a1:a5)

is entered in cell c6. It will add only the cells adjacent to the value “red”, and return the value 90.

=sumif(decision_range,condition,range_to_add)

seems to work the same with Excel or LibreOffice Calc:)

Oracle Tutoring by Jack and Diane, Campbell River, BC.

Spreadsheets: how to reference a cell on a different sheet in Excel and in LibreOffice Calc

Tutoring spreadsheet usage, cell references are important.

Let’s imagine you want a1 in the first sheet of a workbook to have the value of b1 in sheet 2.

Here’s how you can do so with text:

  • Excel: type, in a1 of the first sheet, =sheet2!b1
  • LibreOffice Calc: type, in a1 of the first sheet, =sheet2.b1

HTH:)

Jack of Oracle Tutoring by Jack and Diane, Campbell River, BC.

Spreadsheets: Excel, LibreOffice Calc: number formatting: how to get rid of E-05 (for example)

The tutor explains how to change from scientific notation to regular number format in Excel or LibreOffice Calc.

In my post from Feb 14 I mention that 1.69e-05 equals 0.0000169. Written 1.69e-05, the number is in scientific notation. Perhaps the user is not familiar, or not comfortable with that format – what can be done?

  1. Right-click the cell with the number in scientific notation.
  2. Click Format Cells…
  3. Click Number. There is a click box for the number of decimal places; if the number is in scientific notation with e-05 or such, you’ll likely need lots (maybe 10 or more). There are also choices for how you want the number to appear; select the one desired.
  4. Click OK: Hopefully the number will now be in “ordinary” format.
  5. You may need to widen the column to accommodate the number.
  6. If there is a positive number after the e, the cell contains a large value: for example, 1e+06 is 1 000 000. In such a case, you probably won’t need more decimal places, but might need to widen the column to accommodate the number in regular format:)

Jack of Oracle Tutoring by Jack and Diane, Campbell River, BC.

Brands: who owns Tetley?

The tutor shares another brand find.

I’ve known of Tetley since I can remember. Who owns the brand?

Apparently, since 2000, it’s been part of TATA Global Beverages. Tetley had been a UK company since 1837. It seems that Tetley introduced the tea bag in the UK, though it had been invented in the United States.

Today, Tetley is the world’s second biggest tea brand – Lipton is first.

Source:

www.tataglobalbeverages.com
time.com
www.tea.co.uk

Jack of Oracle Tutoring by Jack and Diane, Campbell River, BC.