Lisa's+Journal

Read my final paperRead my studyRead my study

Read my journals below: Introduction Journal1: Definitions/distinctions of populations & samples; Attributes and variables; levels of measurement Journal 2: Types of quantitative research designs, methods of quantitative data collection Journal 3: Methods for displaying quantitative data Journal 4: Univariate analysis including: frequency distributions, central tendencies, dispersion Journal 5: Probability distributions Journal 6: Sampling methods, sampling distributions, confidence intervals Journal 7: Measures of association and correlation Journal 8: Linear regression Journal 9: Multiple regression Journal 10: Statistical significance Journal 11: Hypothesis testing Journal 12: Standardized scores Journal 13: Symmetry and skewness Journal 14: Comparision of groups: t test, ANOVA Journal 15: Inferential errors (Type 1 and Type 2 Journal 16: Interactions, especially block by treatment forms Journal Conclusion

Frequency distribution is the frequency that a certain variable occurs in the sample group. If you take a histogram and draw a line over the upper limits it represents frequency distribution.


 * CENTRAL TENDENCY: You can evaluate or represent the central tendency of the data in three ways:**
 * 1) mode (mo) - nominal, ordinal, continuous
 * 2) medium (mdn) - ordinal, continuous
 * 3) mean ( //x// ) - continuous

Σx total __ mean = n

Normal distribution (bell curve) - mo = mdn = //x//
 * SHAPE OF DATA:** The shape of the data is another way of interpretation. By looking at the data through a visualization (shape) the reviewer can make global interpretations. For me this is much like the diagnostic interpretation of flow-volume loops. The shapes of flow-volume loops (compares two variables) will give the viewer an indication of physiological changes in the lungs. An interpretation (ie, certain loops indicate restrictive lung disease such as sarcoidosis, etc.) So I am suggesting that the shape of the data can indicate types or classifications that can then be further "diagnosed".

Right skew (the skewer points to the right) - positive values create a right skew. Also known as upward or positive skew. Left skew (the skewer points to the left) - negative values create a left skew. Also known as downward or negative skew.

6/14/2011 Last night I read the instructor's portion of my text, and they wanted to instructor to reinforce the concept of "vague".

Journal entry: 6/28/2011 This week I completed the readings from last week, ran some frequencies with SPSS and tried some of the graphics, but most importantly I dug into the PEW study. First I gave myself the assignment of going through and labeling each variable as nominal, ordinal or continuous. I thought that would be pretty easy and it was, except for a couple of them. I wrote out a key for myself to refer to as I did this so that I could trigger my own logic in each of the types. For nominal I asked myself if each possible answer had a category membership, if it was a label, or if it was a “is or isn’t” type of question. For ordinal I asked myself if each score related to the other scores such as greater/lesser and so forth. For scale I looked for an independent scale with units of measure. Was each datum produced independent of the other scores. One of the questions asks about how many hours the respondent engages in online activities. The original variable setting in the data set was on “scale”. I know that is wrong, because the answers were not in exact hours, but rather in ranges. So that sounds like ordinal. I think that is correct. I had written it down as nominal originally because of “none”, “more than 10” at the top of the choices. Those aren’t really ranges or maybe they are? (question 14) Another question that I’m unsure about is question 26. It is a yes/no question (nominal) WITH qualifications (ordinal?). “yes, a lot more groups”, “yes, just few more groups” , “No, internet has no impact on number of groups”. Question 28 is similar with more/less/none. Question 35 also (and I marked it as nominal) offers the respondent three “yes” with qualifications and one “no”. So now I’m thinking it is ordinal. What I also did was evaluate, as best I understand it from the reading and the notes, the methodology used. The POPULATION is all adults in the continental US who have access to landline or cellular phone. I am deducing that the sample frame is all the telephone listings (???). The method was different for the type of telephone. For landlines they used PROBABILITY SAMPLING, the proportioned by active blocks in area codes/exchanges/blocks (could this be multi-stage clustering??). For cellular phones the is no list, so they used SYSTEMATIC SAMPLING of 100-blocks with no landline associated to it. They did two things for correcting bias. I totally don’t understand what they did for non-response bias. They said “weighting is generally used in survey analysis to compensate for sample designs and patterns of non-response that might bias results.” I do understand why they had to account for the two different sampling methods. They used a dual frame sampling, so they did two things: First, they made a probability selection adjustment because the landlines potentially access several adults, whereas the cellular phones access one adult. They also made a phone use adjustment to account for the possible overlapping of sample frames. They also had to do some balancing for demographics and used data from the 2000 US census. I looked for questions that the data would evoke. Going back through the notes, I understand population/parameter, sample/statistic. Having a sampling frame seems ideal. I generally understand multi-stage clusters, that it can be a variable number of “stages” that become less broad. I’m not clear how to determine the number of stages, and I think that you are conducting random sampling (or systematic sampling?) at each stage or cluster? I would like to consider using stratified sampling in my data set. Other than using strata of demographics, I haven’t determined other characteristics that would be interesting to explore as a strata. I want to keep this in mind as I continue to explore the data. For some reason CENTRAL TENDENCY has been easiest to understand visually versus the numbers. By taking a normal distribution curve where mo=mdn=x and then transposing variations I can get it visually. I think I’d like to have a conversation about what things might affect each of these values in different ways. We may have already, but I didn’t grasp it totally. Because Dee stressed not to focus on the numbers/calculations, but rather on the deeper meaning, that is where I’d particularly like to understand these three values. NOTE TO SELF: Go back and look at visual representations of changes in central tendency. Dispersion, or range, affects the width of the bell curve. I have a note to include range in the frequency distribution calculation… The SHAPE OF DATA can appear as a skew. I remember how to express this as right or left (I’ve always struggled with perceiving my left and right, so this is more confounding). Luckily for me the skew is expressed from the viewers point of view. In the medical world it revolves around the patient’s right or left, so I have a habit of double checking this point, have to let go of that. Anyway, an easy way for me to remember is that the skew is the “corkscrew” end. So a skew to the right, means that means the bell curve has shifted to the left, and vice versa. Again, I’d like to see what data would do this. When the data is skewed you want to choose median, when it is symmetrical you can choose mean for calculations. Standard deviation and variance are a bit cloudy for me. I believe that SD is an expression of the variance, whether it is the population or the sample. The square root of the variance is the SD. I really may need to go back and read some more on this to understand it visually. I know we are looking at the variance between each datum and the ??? mean. By calculating the SD we can express the variation of the data. I just got the hang of running contingency tables. And will write more about this next time. I believe you want to use Pearson’s contingency co-eff. To figure out if there is an association between categorical variables. I also had some initial questions for the data. I will write those in my next journal entry.

Update: 6/29 I just was reading the SPSS Survival guide, measures of association to be specific. and I realize I was incorrect about Pearson's R. It is used for continuous variables, not categorical. When Dee was explaining this concept of measure of association, I wasn't quite getting it, but I find it clearer now. These measures of association simply allow you to realize when there is a relationship between variables and when there is not. When you find associations you can better predict how one will affect the other as you support your theories?