A Data Package is collection of
files and consists of both data, which can be any type of information
such as images and CSV files, and meta data. These files are usually
stored in one directory (possibly with sub directories) although links
to external data are possible. Meta data is data about data and consists
of the information needed by software programmes to use the data and
information needed by users of the data such as descriptions, names of
authors, licences etc. The meta data is stored in a file in the
directory that is usually called datapackage.json. The
information in this file is what below will be called the Data Package.
As mentioned, it contains both information on the data package itself
(title, description) and information on a number of Data Resources. The
Data Resources describe the data files in the data package and also
contains information like a title, description, but also information
needed by software to use the data such as the path to the data
(location of the data), and technical information such as how the data
is stored. This information makes it easier to use the data. Below we
will show how we can use the information in a Data Package to easily
read in the data and work with the data and we will show how we can
create a Data Package for our own data.
Below an overview of some of the terminology associated with Data Packages.
title,
name and description.data property
or external data pointed to by a path property.title,name,encoding`, …data.frame in R).name and
type.open_datapackage() reads the meta data from the
datapackage.json. From the output below you can see that
the data package has three data resources.
> library(datapackage, warn.conflicts = FALSE)
> dir <- system.file("examples/employ", package = "datapackage")
> dp <- open_datapackage(dir)
> dp
[example] Example data set for the datapackage package
This is an example data set to show how the datapackage package can be used to import data into R. [...]
Location: </tmp/RtmpxawnAg/Rinst29a841d0bc74/datapackage/examples/employ>
Resources:
[employment] Employment status
[codelist-gender] Code list for gender of person
[codelist-employ] Code list for employment statusTo read the data beloning to one of the data resources:
> dta <- dp |> dp_resource("employment") |> dp_get_data()
> dta
          id        dob gender employ  income haspartner
1  368509515 1993-06-14      M      E 2691.80      FALSE
2  187844355 1961-10-08      X      U      NA      FALSE
3  273040044 1982-06-17      F      E  533.65      FALSE
4  963831798 1965-02-15      M      E  790.13      FALSE
5  854856378 1990-01-30      F      E  716.79       TRUE
6   20072760 1961-08-18      F      E 1651.60      FALSE
7  429782078 2019-09-23      M      N      NA      FALSE
8  711292034 1994-02-16      M      E  455.98      FALSE
9  949458305 2005-06-16      F      N      NA      FALSE
10 911459071 2007-02-27      F      N      NA      FALSE
11 921370403 1981-09-14      F      E 1461.36      FALSE
12  26901869 1981-03-08      F      E 1153.02       TRUE
13 640668848 1993-08-30      M      E  708.98       TRUE
14 996464509 1960-12-10      M      E 1088.99      FALSE
15  58820512 1962-10-13      M      E 2243.10      FALSE
16 288242988 2013-06-11      M      N      NA      FALSE
17 549758863 1990-06-09      M      E  719.69       TRUE
18 998045846 1973-01-25      F      E 1312.45      FALSE
19 902078272 1962-05-29      F      E  618.27      FALSE
20 594477489 1952-08-31      -      N      NA       TRUEWhen the name of the data resource is known, the data can also be read directly from the data package without explicitly opening the Data Package:
> dta <- dp_load_from_datapackage(dir, "employment")
> dta
          id        dob gender employ  income haspartner
1  368509515 1993-06-14      M      E 2691.80      FALSE
2  187844355 1961-10-08      X      U      NA      FALSE
3  273040044 1982-06-17      F      E  533.65      FALSE
4  963831798 1965-02-15      M      E  790.13      FALSE
5  854856378 1990-01-30      F      E  716.79       TRUE
6   20072760 1961-08-18      F      E 1651.60      FALSE
7  429782078 2019-09-23      M      N      NA      FALSE
8  711292034 1994-02-16      M      E  455.98      FALSE
9  949458305 2005-06-16      F      N      NA      FALSE
10 911459071 2007-02-27      F      N      NA      FALSE
11 921370403 1981-09-14      F      E 1461.36      FALSE
12  26901869 1981-03-08      F      E 1153.02       TRUE
13 640668848 1993-08-30      M      E  708.98       TRUE
14 996464509 1960-12-10      M      E 1088.99      FALSE
15  58820512 1962-10-13      M      E 2243.10      FALSE
16 288242988 2013-06-11      M      N      NA      FALSE
17 549758863 1990-06-09      M      E  719.69       TRUE
18 998045846 1973-01-25      F      E 1312.45      FALSE
19 902078272 1962-05-29      F      E  618.27      FALSE
20 594477489 1952-08-31      -      N      NA       TRUEWith the convert_categories argument categorical
variables can be converted to factor:
> dta <- dp_load_from_datapackage(dir, "employment", 
+   convert_categories = "to_factor")
> dta
          id        dob  gender                 employ  income haspartner
1  368509515 1993-06-14    Male               Employed 2691.80      FALSE
2  187844355 1961-10-08   Other             Unemployed      NA      FALSE
3  273040044 1982-06-17  Female               Employed  533.65      FALSE
4  963831798 1965-02-15    Male               Employed  790.13      FALSE
5  854856378 1990-01-30  Female               Employed  716.79       TRUE
6   20072760 1961-08-18  Female               Employed 1651.60      FALSE
7  429782078 2019-09-23    Male Non-working-population      NA      FALSE
8  711292034 1994-02-16    Male               Employed  455.98      FALSE
9  949458305 2005-06-16  Female Non-working-population      NA      FALSE
10 911459071 2007-02-27  Female Non-working-population      NA      FALSE
11 921370403 1981-09-14  Female               Employed 1461.36      FALSE
12  26901869 1981-03-08  Female               Employed 1153.02       TRUE
13 640668848 1993-08-30    Male               Employed  708.98       TRUE
14 996464509 1960-12-10    Male               Employed 1088.99      FALSE
15  58820512 1962-10-13    Male               Employed 2243.10      FALSE
16 288242988 2013-06-11    Male Non-working-population      NA      FALSE
17 549758863 1990-06-09    Male               Employed  719.69       TRUE
18 998045846 1973-01-25  Female               Employed 1312.45      FALSE
19 902078272 1962-05-29  Female               Employed  618.27      FALSE
20 594477489 1952-08-31 Unknown Non-working-population      NA       TRUEOr, they can be converted to the code
class from the codelist package. This will preserve
both the codes and the labels:
> library(codelist)
> dta <- dp_load_from_datapackage(dir, "employment", 
+   convert_categories = "to_code")
> dta
          id        dob     gender      employ  income haspartner
1  368509515 1993-06-14 M[Male]    E[Employed] 2691.80      FALSE
2  187844355 1961-10-08 X[Other]   U[Unemplo…]      NA      FALSE
3  273040044 1982-06-17 F[Female]  E[Employed]  533.65      FALSE
4  963831798 1965-02-15 M[Male]    E[Employed]  790.13      FALSE
5  854856378 1990-01-30 F[Female]  E[Employed]  716.79       TRUE
6   20072760 1961-08-18 F[Female]  E[Employed] 1651.60      FALSE
7  429782078 2019-09-23 M[Male]    N[Non-wor…]      NA      FALSE
8  711292034 1994-02-16 M[Male]    E[Employed]  455.98      FALSE
9  949458305 2005-06-16 F[Female]  N[Non-wor…]      NA      FALSE
10 911459071 2007-02-27 F[Female]  N[Non-wor…]      NA      FALSE
11 921370403 1981-09-14 F[Female]  E[Employed] 1461.36      FALSE
12  26901869 1981-03-08 F[Female]  E[Employed] 1153.02       TRUE
13 640668848 1993-08-30 M[Male]    E[Employed]  708.98       TRUE
14 996464509 1960-12-10 M[Male]    E[Employed] 1088.99      FALSE
15  58820512 1962-10-13 M[Male]    E[Employed] 2243.10      FALSE
16 288242988 2013-06-11 M[Male]    N[Non-wor…]      NA      FALSE
17 549758863 1990-06-09 M[Male]    E[Employed]  719.69       TRUE
18 998045846 1973-01-25 F[Female]  E[Employed] 1312.45      FALSE
19 902078272 1962-05-29 F[Female]  E[Employed]  618.27      FALSE
20 594477489 1952-08-31 -[Unknown] N[Non-wor…]      NA       TRUEWhen the data resource name is omitted from
dp_load_from_datapackage() either the data resource with
same name as the data package or the first data resource is opened.
Below we open an example Data Package that comes with the package:
> library(datapackage, warn.conflicts = FALSE)
> dir <- system.file("examples/employ", package = "datapackage")
> dp <- open_datapackage(dir)
> dp
[example] Example data set for the datapackage package
This is an example data set to show how the datapackage package can be used to import data into R. [...]
Location: </tmp/RtmpxawnAg/Rinst29a841d0bc74/datapackage/examples/employ>
Resources:
[employment] Employment status
[codelist-gender] Code list for gender of person
[codelist-employ] Code list for employment statusThe print statement shows the name of the package,
example, the title, the first paragraph of the description,
the location of the Data Package and the Data Resources in the package.
In this case there are three Data Resources:
> dp_nresources(dp)
[1] 3The names are
> dp_resource_names(dp)
[1] "employment"      "codelist-gender" "codelist-employ"Using the resource() method on the Data Package can
obtain the Data Resource
> employ <- dp_resource(dp, "employment")
> employ
[employment] Employment status
Employment status, income and background properties of sample of persons. [...]
Fields:
[id] <integer> Random personal identifier
[dob] <date> Date of Birth
[gender] <string/categories> Gender
[employ] <string/categories> Employment status
[income] <number> Net income
[haspartner] <boolean> Does person have a partner?
Selected properties:
path     :"employ.csv"
format   :"csv"
mediatype:"text/csv"
encoding :"utf-8"
dialect  :List of 3The print statement again shows the name, title and
description. It also shows that the data is in a CSV-file anmes
employ.csv. Standard the print shows only a
few properties of the Data Resource. To show all properties:
> print(employ, properties = NA)
[employment] Employment status
Employment status, income and background properties of sample of persons. [...]
Fields:
[id] <integer> Random personal identifier
[dob] <date> Date of Birth
[gender] <string/categories> Gender
[employ] <string/categories> Employment status
[income] <number> Net income
[haspartner] <boolean> Does person have a partner?
Selected properties:
path     :"employ.csv"
format   :"csv"
mediatype:"text/csv"
encoding :"utf-8"
dialect  :List of 3Using this information it should be possible to open the dataset. The
data can be opened in R using the dp_get_data() method.
Based on the information in the Data Resource this function will try to
open the dataset using the correct functions in R (in this case
read.csv()):
> dta <- dp_get_data(employ)
> head(dta)
         id        dob gender employ  income haspartner
1 368509515 1993-06-14      M      E 2691.80      FALSE
2 187844355 1961-10-08      X      U      NA      FALSE
3 273040044 1982-06-17      F      E  533.65      FALSE
4 963831798 1965-02-15      M      E  790.13      FALSE
5 854856378 1990-01-30      F      E  716.79       TRUE
6  20072760 1961-08-18      F      E 1651.60      FALSEIt is also possible to import the data directly from the Data Package object by specifying the resource for which the data needs to be imported.
> dta <- dp_get_data(dp, "employment")The dp_get_data() method only supports a limited set of
data formats. It is possible to also provide a custum function to read
the data using the reader argument of
dp_get_data(). However, it is also possible to import the
data ‘manually’ using the information in the Data Package. The path of
the file in a Data Resource can be obtained using the
dp_path() method:
> dp_path(employ)
[1] "employ.csv"By default this will return the path as defined in the Data Package.
This either a path relative to the directory in which the Data Package
is located or a URL. To open a file inside the Data Package one also
needs the location of the Data Package. Using the
full_path = TRUE argument, dp_path() will
return the full path to the file:
> fn <- dp_path(employ, full_path = TRUE)This path can be used to open the file manually:
> dta <- read.csv2(fn)
> head(dta)
         id        dob gender employ   income haspartner
1 368509515 1993-06-14      M      E €2 691,8          N
2 187844355 1961-10-08      X      U     <NA>          N
3 273040044 1982-06-17      F      E  €533,65          N
4 963831798 1965-02-15      M      E  €790,13          N
5 854856378 1990-01-30      F      E  €716,79          Y
6  20072760 1961-08-18      F      E €1 651,6          NFirst, note that we had to ‘know’ that we had to use
read.csv2 since the file uses the ‘;’ as field
separator. Information like this is stored in the ‘dialect’ property of
a data resource:
> dp_property(employ, "dialect")
$decimalChar
[1] ","
$delimiter
[1] ";"
$nullSequence
[1] "NA"Second, note that the field ‘income’ is not converted to numeric as this field contains euro symbols and used a space as thousands separator. Information like this is stored in the field descriptor:
> dp_field(employ, "income")
[income] <number> Net income
Selected properties:
bareNumber :FALSE
decimalChar:","
groupChar  :" "dp_get_data() uses the information from the field
descriptors and dialect to automatically convert variables as much a
possible to their most fitting R types. This is done using the
dp_apply_schema() function:
> dp_apply_schema(dta, employ)
          id        dob gender employ  income haspartner
1  368509515 1993-06-14      M      E 2691.80      FALSE
2  187844355 1961-10-08      X      U      NA      FALSE
3  273040044 1982-06-17      F      E  533.65      FALSE
4  963831798 1965-02-15      M      E  790.13      FALSE
5  854856378 1990-01-30      F      E  716.79       TRUE
6   20072760 1961-08-18      F      E 1651.60      FALSE
7  429782078 2019-09-23      M      N      NA      FALSE
8  711292034 1994-02-16      M      E  455.98      FALSE
9  949458305 2005-06-16      F      N      NA      FALSE
10 911459071 2007-02-27      F      N      NA      FALSE
11 921370403 1981-09-14      F      E 1461.36      FALSE
12  26901869 1981-03-08      F      E 1153.02       TRUE
13 640668848 1993-08-30      M      E  708.98       TRUE
14 996464509 1960-12-10      M      E 1088.99      FALSE
15  58820512 1962-10-13      M      E 2243.10      FALSE
16 288242988 2013-06-11      M      N      NA      FALSE
17 549758863 1990-06-09      M      E  719.69       TRUE
18 998045846 1973-01-25      F      E 1312.45      FALSE
19 902078272 1962-05-29      F      E  618.27      FALSE
20 594477489 1952-08-31      -      N      NA       TRUEFinally, note that the path property of a Data Resource
can be a vector of paths in case a single data set is stored in a set of
files. It is assumed then that the files have the same format.
Therefore, rbind should work on these files.
Below is an alternative way of importing the data belonging to a Data Resource. Here we use the pipe operator to chain the various commands to import the data set.
> dta <- dp_resource(dp, "employment") |> dp_get_data()
> head(dta)
         id        dob gender employ  income haspartner
1 368509515 1993-06-14      M      E 2691.80      FALSE
2 187844355 1961-10-08      X      U      NA      FALSE
3 273040044 1982-06-17      F      E  533.65      FALSE
4 963831798 1965-02-15      M      E  790.13      FALSE
5 854856378 1990-01-30      F      E  716.79       TRUE
6  20072760 1961-08-18      F      E 1651.60      FALSEFor many of the standard fields of a Data Packages, methods are defined to obtain the values of these fields:
> dp_name(dp)
[1] "example"
> dp_description(dp)
[1] "This is an example data set to show how the datapackage package can be used to import data into R.\n\nIts main data resource is the `employ` resource which contains fictional data about individuals. The other data resources are supporting data sets."
> dp_description(dp, first_paragraph = TRUE)
[1] "This is an example data set to show how the datapackage package can be used to import data into R."
> dp_title(dp)
[1] "Example data set for the datapackage package"The same holds for Data Resources:
> dp_title(employ)
[1] "Employment status"
> dp_resource(dp, "codelist-employ") |> dp_title()
[1] "Code list for employment status"For datapackage objects there are currently defined the
following methods: (this list can be obtained using
?PropertiesDatapackage)
dp_contributors()dp_created()dp_description()dp_id()dp_keywords()dp_name()dp_title()For dataresource objects there are currently defined the
following methods (this list can be obtained using
?PropertiesDataresource)
dp_bytes()dp_encoding()dp_description()dp_format()dp_hash()dp_name()dp_mediatype()dp_path()dp_schema()dp_title()The dp_path() method has a full_path
argument that, when used, returns the full path to the Data Resources
data and not just the path relative to the Data Package. The full path
is needed when one wants to use the path to read the data.
> dp_path(employ)
[1] "employ.csv"
> dp_path(employ, full_path = TRUE)
[1] "/tmp/RtmpxawnAg/Rinst29a841d0bc74/datapackage/examples/employ/employ.csv"It is also possible to get other properties than the ones explicitly
mentioned above using the dp_property() method:
> dp_property(employ, "encoding")
[1] "utf-8"It is possible for fields to have a list of categories
associated with them. Categories are usually stored inside the Field
Descriptor. However, the datapackage package also supports
lists of categories stored in a seperate Data Resource (this is not part
of the datapackage standard).
In the example resource, there is are ‘gender’ and ‘employ’ that have categories associated with them:
> dta <- dp_resource(dp, "employment") |> dp_get_data()
> dta
          id        dob gender employ  income haspartner
1  368509515 1993-06-14      M      E 2691.80      FALSE
2  187844355 1961-10-08      X      U      NA      FALSE
3  273040044 1982-06-17      F      E  533.65      FALSE
4  963831798 1965-02-15      M      E  790.13      FALSE
5  854856378 1990-01-30      F      E  716.79       TRUE
6   20072760 1961-08-18      F      E 1651.60      FALSE
7  429782078 2019-09-23      M      N      NA      FALSE
8  711292034 1994-02-16      M      E  455.98      FALSE
9  949458305 2005-06-16      F      N      NA      FALSE
10 911459071 2007-02-27      F      N      NA      FALSE
11 921370403 1981-09-14      F      E 1461.36      FALSE
12  26901869 1981-03-08      F      E 1153.02       TRUE
13 640668848 1993-08-30      M      E  708.98       TRUE
14 996464509 1960-12-10      M      E 1088.99      FALSE
15  58820512 1962-10-13      M      E 2243.10      FALSE
16 288242988 2013-06-11      M      N      NA      FALSE
17 549758863 1990-06-09      M      E  719.69       TRUE
18 998045846 1973-01-25      F      E 1312.45      FALSE
19 902078272 1962-05-29      F      E  618.27      FALSE
20 594477489 1952-08-31      -      N      NA       TRUEThis is string column but it has an ‘categories’ property set which points to a Data Resource in the Data Package. It is possible te get this list of
categories
> dp_categorieslist(dta$employ)
  code                  label missing
1    E               Employed   FALSE
2    U             Unemployed   FALSE
3    N Non-working-population   FALSE
4    X                Unknown    TRUEThis list of categories can also be used to convert the field to factor:
> dp_to_factor(dta$employ)
 [1] Employed               Unemployed             Employed              
 [4] Employed               Employed               Employed              
 [7] Non-working-population Employed               Non-working-population
[10] Non-working-population Employed               Employed              
[13] Employed               Employed               Employed              
[16] Non-working-population Employed               Employed              
[19] Employed               Non-working-population
attr(,"fielddescriptor")
[employ] <string/categories> Employment status
Categories:
 value                  label
     E               Employed
     U             Unemployed
     N Non-working-population
     X                Unknown
Levels: Employed Unemployed Non-working-population UnknownUsing the convert_categories = "to_factor" argument of
dp_apply_schema() (which is called by
dp_get_data()) it is also possible to convert all fields
which have an associated ‘categories’ field to factor:
> dta <- dp_resource(dp, "employment") |> 
+   dp_get_data(convert_categories = "to_factor")
> dta
          id        dob  gender                 employ  income haspartner
1  368509515 1993-06-14    Male               Employed 2691.80      FALSE
2  187844355 1961-10-08   Other             Unemployed      NA      FALSE
3  273040044 1982-06-17  Female               Employed  533.65      FALSE
4  963831798 1965-02-15    Male               Employed  790.13      FALSE
5  854856378 1990-01-30  Female               Employed  716.79       TRUE
6   20072760 1961-08-18  Female               Employed 1651.60      FALSE
7  429782078 2019-09-23    Male Non-working-population      NA      FALSE
8  711292034 1994-02-16    Male               Employed  455.98      FALSE
9  949458305 2005-06-16  Female Non-working-population      NA      FALSE
10 911459071 2007-02-27  Female Non-working-population      NA      FALSE
11 921370403 1981-09-14  Female               Employed 1461.36      FALSE
12  26901869 1981-03-08  Female               Employed 1153.02       TRUE
13 640668848 1993-08-30    Male               Employed  708.98       TRUE
14 996464509 1960-12-10    Male               Employed 1088.99      FALSE
15  58820512 1962-10-13    Male               Employed 2243.10      FALSE
16 288242988 2013-06-11    Male Non-working-population      NA      FALSE
17 549758863 1990-06-09    Male               Employed  719.69       TRUE
18 998045846 1973-01-25  Female               Employed 1312.45      FALSE
19 902078272 1962-05-29  Female               Employed  618.27      FALSE
20 594477489 1952-08-31 Unknown Non-working-population      NA       TRUEWhen the codelist
package is installed, it is also possible to convert the column to a
code vector:
> dta <- dp_resource(dp, "employment") |> 
+   dp_get_data(convert_categories = "to_code")
> dta
          id        dob     gender      employ  income haspartner
1  368509515 1993-06-14 M[Male]    E[Employed] 2691.80      FALSE
2  187844355 1961-10-08 X[Other]   U[Unemplo…]      NA      FALSE
3  273040044 1982-06-17 F[Female]  E[Employed]  533.65      FALSE
4  963831798 1965-02-15 M[Male]    E[Employed]  790.13      FALSE
5  854856378 1990-01-30 F[Female]  E[Employed]  716.79       TRUE
6   20072760 1961-08-18 F[Female]  E[Employed] 1651.60      FALSE
7  429782078 2019-09-23 M[Male]    N[Non-wor…]      NA      FALSE
8  711292034 1994-02-16 M[Male]    E[Employed]  455.98      FALSE
9  949458305 2005-06-16 F[Female]  N[Non-wor…]      NA      FALSE
10 911459071 2007-02-27 F[Female]  N[Non-wor…]      NA      FALSE
11 921370403 1981-09-14 F[Female]  E[Employed] 1461.36      FALSE
12  26901869 1981-03-08 F[Female]  E[Employed] 1153.02       TRUE
13 640668848 1993-08-30 M[Male]    E[Employed]  708.98       TRUE
14 996464509 1960-12-10 M[Male]    E[Employed] 1088.99      FALSE
15  58820512 1962-10-13 M[Male]    E[Employed] 2243.10      FALSE
16 288242988 2013-06-11 M[Male]    N[Non-wor…]      NA      FALSE
17 549758863 1990-06-09 M[Male]    E[Employed]  719.69       TRUE
18 998045846 1973-01-25 F[Female]  E[Employed] 1312.45      FALSE
19 902078272 1962-05-29 F[Female]  E[Employed]  618.27      FALSE
20 594477489 1952-08-31 -[Unknown] N[Non-wor…]      NA       TRUEThis has the advantage that both the values/codes and the labels are kept together and it is possible to use both when coding which can make code safer and more readable:
> library(codelist)
> dta[dta$gender == "X", ]
         id        dob   gender      employ income haspartner
2 187844355 1961-10-08 X[Other] U[Unemplo…]     NA      FALSE
> dta[dta$gender == as.label("Other"), ]
         id        dob   gender      employ income haspartner
2 187844355 1961-10-08 X[Other] U[Unemplo…]     NA      FALSEThis is shown in a seperate vignette
Creating a Data Package
A quick way to create a Data Package from a given dataset is with the
dp_save_as_datapackage() function:
> dir <- tempfile()
> data(iris)
> dp_save_as_datapackage(iris, dir)And for reading:
> dp_load_from_datapackage(dir) |> head()
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2       1
2          4.9         3.0          1.4         0.2       1
3          4.7         3.2          1.3         0.2       1
4          4.6         3.1          1.5         0.2       1
5          5.0         3.6          1.4         0.2       1
6          5.4         3.9          1.7         0.4       1This will either load the Data Resource with the same name as the
Data Package or the first resource in the Data Package. It is also
possible to specify the name of the Data Resource that should be read.
Additional arguments are passed on to dp_get_data()):
> dp_load_from_datapackage(dir, "iris", convert_categories = "to_factor", 
+   use_fread = TRUE)
Loading required namespace: data.table
 
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
            <num>       <num>        <num>       <num>    <fctr>
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
146:          6.7         3.0          5.2         2.3 virginica
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica