University of Toronto. Data Library Service


UT/DLS: commonly asked questions

Contents: questions about the Data Library Service, statistics, data, software, miscellaneous. See also DIGRS blog [University of Alberta. Data Library]
    About the Data Library Service:
  1. Can you help me find articles and books on my topic?
  2. So when should I use the Data Library Service?
  3. How do I find out what data the Data Library has?
  4. How do I get the data files?
  5. Can't I just get the data on a floppy disk?
  6. But I just need a couple of figures to put in my report. Can't I just get them printed out?
  7. Will you do my statistical analysis for me?
  8. Do I have access to E-stat through UofT?
  9. Where do I get access to Datastream at UT?
  10. Where do I get access to the WRDS interface/database?
  11. You don't have the data file I need. What should I do?
  12. When will the _______ data (e.g. from Statistics Canada) be available?
  13. When I try to access the ___ database, it tells me I don't have permission.
  14. I can remember the [Canadian] survey name in French[English] but what is it in English[French]?. [Last rev. 2005/09/19]
  15. What is <odesi> and when should I use it?
  16. How do I access <odesi> from off campus?
  17. What is SDA and when should I use it?
  18. How do I access the SDA server from off campus?
  19. How do I access data from ICPSR?
  20. How do I log in to ICPSR from off campus?
  21. How do I download DLI data from Statistics Canada? It tells me I need a user name and password.
    About Statistics:
  22. What's the difference between statistics and data?
  23. How do I get access to Statistics Canada statistics and data?
  24. How do I evaluate the quality of the statistics/data I found?
  25. How do I match Canadian postal codes to census profile data?
  26. How do I use the CPI to convert dollar amounts?
  27. How do I compute a fertility rate from aggregate data?
  28. How do I compute an unemployment rate?
  29. What's the difference between the unemployment rate, the participation rate and the employment rate?
  30. How do I rebase an index to another index year?
  31. What is a Gini coefficient?
  32. What's the difference between nominal and real GDP?
  33. What's the difference between the Paasche, Laspeyres, and Fisher price indices?
  34. Where can I find an explanation of the World Bank's classification of countries by income (GDP)?
  35. How do I compute energy losses from the available Canadian data?
  36. What do 'C$' and 'K$' mean?
    About Data:
  37. What's the difference between statistics and data?
  38. How do I search for data sets with specific concepts/variables?
  39. When should I use a weight variable, and what kind should I use when?
  40. Handling missing or incomplete data
  41. How do I evaluate the quality of the statistics/data I found?
  42. What is meant by sampling error?
  43. What is meant by non-sampling error?
  44. How do I read a hierarchical file into SPSS or Stata?
  45. I need to analyze change over time. Can you point me to some available data to get me started?
  46. What's the difference between public use microdata file (pumf) and the RDC versions of the 1996, 2001, and 2006 census of population microdata?
  47. What is a 'synthetic' or 'dummy' file?
    About Software:
  48. How do I read an MS Word, Excel, RTF etc. format file from my WWW browser?
  49. How do I read fixed-field or delimited data into SAS, SPSS, Shazam, Stata, etc.
  50. How do I read a Beyond 20/20 file, or a file with an .ivt extension?
  51. Where do I get the Beyond 20/20 software?
  52. How do I read a Beyond 20/20 file on a MAC or Linux?
  53. What is the BOOTVAR program and where can I get a copy?
  54. How do I read this file into SHAZAM? It doesn't have blanks between the variables!
  55. Where do I get the Shazam software?
  56. How do I read the microdata output from the CHASS Census Analyzer into SPSS?
  57. What's the difference between IDLS from UWO, <odesi> and SDA (ie MDAS at CHASS)?
  58. How do I take a random sample of a large dataset in the SDA interface
  59. How do I read the output from SDA into SPSS?
  60. How do I download a SAS/SPSS/Stata system file from <odesi>?
  61. How do I read an ascii data set into SPSS?
  62. How do I export a table from PDF into an Excel spreadsheet?
  63. How to I get a WRDS account?
    Miscellaneous:
  64. What is the difference between the GTA and the CMA of Toronto?
  65. How do I cite the data in my bibliography?
  66. Where can I find some resources on 'statistical literacy'?


    Q: Can you help me find articles and books on my topic?

    A: No, the DLS has not expertise with bibliographic databases, only with statistics and data. For help with selecting and accessing bibliographic databases, you should ask at the Reference Desk of your favourite branch library.


    Q: So when should I use the Data Library Service?

    A: when you need to do statistical analysis, for example, to test a hypothesis using existing data files or data bases, or to replicate another researcher's statistical analysis, or simply require statistics or data that are not available in printed form.

    Similarly, if you have collected data yourself, and have or are going to publish research based on that data, the DLS can assist you in making the data available to other researcher for secondary analysis.

    Alternatively, if you have received a SSHRCC grant for research involving the collection of new data, the DLS can assist you to fulfill the deposit requirement in the SSHRCC grant application. Similarly, if you have collected original research data with MRC, NSWERC, or other grant funding, and wish to have your data archived and made available to other researchers, we can provide long-term archival management, as well as dissemination of the data as per your direction. See our web pages on data deposit and preservation.


    Q: How do I find out what data files the Data Library has?

    A:

    1. We have listings of major data collections by subject, as well as web sites listing Canadian and U.S. and international statistics in our collections as well as available elsewhere.

    2. Alternatively, search our web site (if you know the title and/or principal investigator) using our Google interface:
      Google Custom Search

    3. E-mail us and ask.

    4. If you cannot locate the data you are looking for e-mail us and ask anyway. If we do not currently have it, we may be able to obtain it for you from another source.

    Unfortunately, at present, you can only search for words in the title or the name of the principal investigator(s). If you don't find any files of interest, please contact us, as we may be able to recommend an alternative.


    Q: How do I get the data files?

    A: You have a number of alternatives:

    1. Many files containing statistics, e.g. census statistics, are directly accessible, see:

    2. Many major microdata files are available for analysis or subsetting via the SDA interface.
      To request that additional files be made available via SDA, contact <laine.ruus@utoronto.ca>.

    3. Other data files must be requested. It will help expedite your data request if you can let us know:
      • The data file(s) you want (the actual physical filenames)
      • If you want a subset, exactly which variables/cases
      • What software you intend to use for your analysis (SAS, SPSS, Stata, etc.)
      • What platform you will be working on (CHASS, a Windows workstation, a Mac workstation, etc)
      • How you would like the data delivered (via SDA, ftp, or on a CD-R).

    4. Users at other institutions, or not affiliated with the University of Toronto, are welcome to consult the Data Library Service re obtaining copies of data files that we are able to redisseminate, or for information about other avenues for obtaining data that the Data Library is not allowed to disseminate.


    Q:Can't I just get the data on a floppy disk?

    A: Most data files in our collection are very large and complex, in the order of hundreds of megabytes, and in some cases gigabytes in size. For most users, especially those working on desktop workstations, doing any kind of analysis with very large data files would be slow and inefficient. It is much more efficient for you if we create a subset with just those variables and cases that you want to analyse, in a format that your software can easily read, thus allowing you to spend more of your time on analysis and less on data management.


    Q: But I just need a couple of figures to put in my report. Can't I just get them printed out?

    A: Most of the statistics that we have that are ready to be used in this way, you can access from our web page on: Finding Canadian statistics or Finding U.S. and international statistics.

    If you can't find the statistics you need there, then in some cases, it may be simpler and faster to locate the 'couple of figures' in a printed source, than to attempt to generate them from the data files. Most of our data files are simply flat ascii files, with no retrieval software, and extracting data from them requires that you have access to statistical software such as SAS, SPSS, SHAZAM, Stata, etc., and write a program using that statistical software to generate the statistics you need.


    Q: Will you do my statistical analysis for me?

    A: No, we are not able to provide statistical consultation. Your alternatives are:
    - if you are an undergraduate student, ask your professor or class TA,
    - if you are a graduate student, faculty member or researcher, you may wish to consult the
    Statistical Consulting Service offered by the Department of Statistics.


    Q: You don't have the data file I need. What should I do?

    A: If the data are available from one of our normal data suppliers, such as ICPSR, Statistics Canada through the Data Liberation Initiative, the Roper Center, etc.), we will be able to acquire it for you quickly, or help you download it yourself, at no cost.

    Even if the data are not available from one of our suppliers, the Data Library Service does have a budget for data acquisition. If the data are likely to be of interest to other researchers or students, and if the DLS budget can afford it, it is very possible that we may acquire it for you.

    In either case, please contact Laine G.M. Ruus to enquire about getting the data.

    The more information you can provide about the data you need, the better.


    Q: How do I read an RTF format file?

    A: RTF is a text file format, and the acronym stands for 'Rich Text Format'. The format can be read by most major word processors.

    To read an RTF file in Netscape:
       From the menu bar select Options/General Preferences/Helpers
       Find RTF in the list of file types
       Click on and edit that file type to open the file in a word processor of your choice (that can read RTF format, of course). If you need to add RTF to the list of helper applications, 'description' is any text you wish, 'mime' is 'application/rtf', and 'suffix' is 'rtf'.

    To read an RTF file in Internet Explorer:
       From the menu bar select View/Options/Programs
       Find RTF in the list of file types
       Click on and edit that file type to open the file in a word processor of your choice (that can read RTF format, of course). If you need to add RTF to the list of helper applications, 'description' is any text you wish, 'mime' is 'application/rtf', and 'suffix' is 'rtf'.


    Q: How do I read a Beyond 20/20 file?

    A: Beyond 20/20 is a commercial data browser tool.

    See our guide to getting and using the Beyond 20/20 software at: <http://www.chass.utoronto.ca/datalib/caq/b2020.htm>.


    Q: How do I read the output from the CHASS Census Analyzer Microdata file output into SPSS?

    A: This involves two steps: first, selecting and saving the appropriate output from the Census Analyzer, and secondly, reading it correctly into SPSS.

    Step 1: select the appropriate output formats.

    • At the CHASS microdata page:
      • Select the appropriate year of microdata,
      • Select desired variables in the selection box (hold down the key to select non-adjacent variables),
      • Click on Submit query
      • Select output format: SPSS
      • Select additional options as appropriate, and click on Submit query
    • Once output is complete, on the Netscape menu bar select File/Save As
      • Select Save as type: Plain Text (*.txt)
      • Assign a filename ending with extension .sps E.g. 96micro.sps.
      • Click on the Save button.

    Step 2: read the file in SPSS for Windows (these instructions are for version 9.0)

    • Load SPSS for Windows 9.0
    • Select File/Open
    • Select Files of type: Syntax (*.sps)
    • Give as appropriate the path and filename of the file saved in Step 1 above.
    • Click on Open. This should open the SPSS Syntax Editor window, and display the full file saved.
    • From the menu bar, select Run/To end. SPSS will load the data into the Data editor display window, complete with variable and value labels as appropriate.

    Alternative Step 2: read the file in SPSS for Unix (These instructions are for version 6.0)

    • Upload the file saved in Step 1 to a Unix platform that has the SPSS software. To upload the file, use an ftp client, such as WS_FTP or Rapid Filer, and remember to upload the file in ASCII format.
    • Add as appropriate analytic or save/export commands at the end of the file, before the FINISH command.
    • Run SPSS in the usual way, with the uploaded file as batch input, e.g. spss -m [filename].sps > [filename].lst &


    Q: When I try to access the ___ database, it tells me I don't have permission.

    A: Most databases acquired by the Data Library Service are restricted to access by University of Toronto faculty, students, and staff, for academic research and teaching purposes only. Therefore, access to the data is restricted to University of Toronto IP addresses only. This is a requirement of our contracts with our data suppliers.

    If you are currently a University of Toronto faculty, student, or staff member, and are working off-campus, you will need to log in to your UTORid before attempting to access restricted data.

    Alternatively, prepend the following character string at the beginning of the URL you are attempting to access:

       http://myaccess.library.utoronto.ca/login?url=

    E.g. http://myaccess.library.utoronto.ca/login?url=http://estat.statcan.ca

    This will take you via the myaccess login to the URL you have requested.


    Q: Where do I get the Beyond 20/20 software?

    A: If you are a University of Toronto, Ryerson University, or York University faculty, student, or staff member, you can download the Beyond 20/20 software from: <http://www.chass.utoronto.ca/datalib/caq/b2020.htm>.

    We do have a University-wide license to use Beyond 20/20 for academic research and teaching purposes, under the provisions of the Statistics Canada Data Liberation Initiative.
    [rev. 2009-10-22]


    Q: Where do I get the Shazam software?

    A: If you are a University of Toronto faculty, student, or staff member, you can get the Shazam software from the Economics Department, which holds a University site licence for the software.

    Contact: Ursula Gutenburg, the Economics Department Librarian, Office: 150 St. George Street, N107. Phone: 416-978-8623. Fax: 416-978-6713. Email address: ecolib@chass.utoronto.ca. The cost is minimal ... about $10.

    For users with CHASS accounts, Shazam is also available on the 'bebop' platform.

    If you are not a University of Toronto faculty, student, or staff member, the Shazam home page is at: <http://shazam.econ.ubc.ca/>.


    Q: How do I use the CPI to convert dollar amounts?

    A: See:


    Q: How do I compute a fertility rate from aggregate data?

    A: See:


    Q: How do I compute an unemployment rate?

    A: Statistics Canada computes the unemployment rate as follows:

    unemployment rate = (unemployed labour force / total labour force) * 100


    Q: How do I cite the data in my bibliography?

    A: See:


    Q: How do I evaluate the quality of the statistics/data I found?

    A: See:


    Q: Do I have access to E-stat through UofT?

    A: Yes, as of approximately 2009, the E-stat web site is completely unrestricted. However, you must agree to Statistics Canada's licence agreement in order to access the resources on the E-stat server. [2009-11-18]


    Q: What's the difference between statistics and data?

    A: I use these two words as follows. Statistics are pre-processed numbers, often called 'descriptive statistics' or 'aggregate statistics', that summarize the characteristics of a group of observations, such as a population, phenomena (eg the weather, stock prices), etc. Examples of descriptive statistics: (1) the population of Toronto in 2001 was 4,682,900 [a count]; (2) the average cost of running a car in Alberta in 1996 was $6,041 [an average or mean]; (3) the cost of Internet access in Newfoundland rose by 2.7% between January 2003 and February 2004 [computed from an index, the CPI]; (4) the average temperature in Toronto in February 2008 [another average], etc. Descriptive (aggregate) statistics often take the form of counts (totals), percentages, means (averages) and medians, rates, or indices. These are the statistics generally published in statistical publications of various kinds and media. Descriptive statistics are usually appropriate answers to questions that ask 'how many/how much'.

    Data, on the other hand, are the raw material from which statistics are created, and more often used to answer questions that ask 'how' or 'why'. Data are raw, unsummarized characteristics as originally collected. The data (or microdata) from which the above examples of statistics were created are: (1) the database of records (one per person) of all persons in Toronto on census day 2001; (2) the actual expenditures on automobile maintenance by each household in Alberta that was interviewed in the Survey of family expenditures, 1996; (3) the actual prices of the goods in the basket used to compute the CPI, in each month in Newfoundland; (4) the temperature as recorded each hour of each day, at all environmental data stations in Toronto, each day in February 2008.

    Data by themselves don't really mean much, and need to be statistically manipulated usually by specialized software, in order to generate descriptive or inferential statistics. The answer that I gave to the 'age' question on my 2001 census questionnaire is a 'datum' and in and of itself is only of interest to myself and the Canada Pension Plan, but in conjunction with the combined economic assets, recreational habits, and general health of my age cohort, become a research resource for planning retirement policy, senior support services in municipalities across Canada, new housing construction, or alternatively, the abolition of compulsory retirement.

    Inferential statistics on the other hand, are statistical measures of significance, of direction, and of magnitude. They are generally used to answer such questions as, "If, based on the available data, I conclude that on average, the more education you have the more income you will have, what are my chances of having drawn an incorrect conclusion, ie what are my chances of being right or wrong? And if I further conclude that each additional university degree adds approximately $10,000 to your annual income, how accurate an estimate is that?"


    Q: How do I rebase an index to another index year?

    A: Use the following formula to compute a ratio: ratio = [new index at time1]/[old index at time1]

    Then multiply all data points on the old index by this ratio.

    Eg. A new index has base year 1995=100; the previous version of the index had base year 1985. The index value for 1995 (on the previous version of the index) was 127.5. Multiply values on the old index by 100/127.5=.7843 to convert them to the new base year.


    Q: What's the difference between the unemployment rate, the participation rate and the employment rate?

    A: Statistics Canada computes these rates as follows:


    Q: What's the difference between nominal and real GDP?

    A: Nominal GDP is GDP at current prices. Real GDP is GDP at constant prices, in effect, ie adjusted by a price deflator such as the CPI.


    Q: How do I read a hierarchical file into SPSS or Stata?

    A: A useful outline of the process of working with hierarchical microdata files in SPSS and Stata is available in:
    Working with survey files: using hierarchical data, matching files and pooling data Rafferty, Anthony & Jo Wathan, ESDS, March 2008.
    Note that the focus of this document is on British data, not Canadian.


    Q: I need to analyze change over time. Can you point me to some available data resources to get me started?

    A: You will find links to resources on repeating cross-sectional and panel surveys at: <http://www.chass.utoronto.ca/datalib/major/long.htm>


    Q: Where can I find some resources on 'statistical literacy'?

    A: Some resources we have found useful are:


    Q: What's the difference between public use microdata file (pumf) and the RDC versions of the 1996, 2001, and 2006 census of population microdata?

    A: According to a dlilist message dated June 5, 2007:

    The 2001 Census data will be available in the RDCs at the end of June 2007; the 1996 and 2006 census data will be available in the RDCs by December 2008. The census tract will be the minimum level of geography available for urban areas and the census subdivision is the lower level for rural areas. Researchers who wish to access these detailed microdata for their research program need to submit their proposals to SSHRC.

    The RDC program has also provided us with the following overview of the major differences between RDC file and the 2001 PUMF:

    1. RDC file one file with four census universes (population, family, household, dwelling) linked to permit analysis at all levels and multilevel.
    2. It is the full 2B file which is a 20% sample (6 080 919 records). The file excludes institutional residents, residents of incompletely enumerated Indian reserves or Indian settlements, and foreign residents, namely foreign diplomats, members of the Armed Forces of another country who are stationed in Canada, and residents of another country who are visiting Canada temporarily.
    3. It has geography at the CT or CSD level where CT is not available (i.e. rural areas)
    4. Provides data directly from the questionnaire and allows researchers to build their own variables.
    5. There is no capping or categorization of continuous variables.
    6. Access to RDC file needs an approval proposal study. Census enhanced confidentiality rules are applied.


    Html by Laine G.M. Ruus, Data Library Service, University of Toronto
    Last updated: 2010-02-03