This page outdated. It explains how this data was generated when GTFS was not available. 

The page explains the process to collect and process data of public transport with the goal of generating a public transport network and travel time matrices. The page refers to the following repositories in Github:

-       transit: https://github.com/cllorca1/transit

-       munichMatsim (MATSim implementation):  https://github.com/cllorca1/matsimMunich

Input and output data are stored in gitLab repositories (currently private of carlos.llorca).

1.      Data extraction

1.1.  Download OSM data

Download the .osm files from Geofrabrik.de. Cut to the study area using Osmosis library tools. Extract valuable data for public transport. It was needed to filter by different OSM tags, including public_transportstop_position and route.

1.2.  Extract transit stops and lines

The previous .osm files (equivalent to .xml files) are read in java and the public transport information is extracted. The details can be read in the program “transit” under the class “ReadXmlFile.java”.

1.3.  Data structure

Inside the JAVA program “transit”, the data is stored in a GTFS-like structure. The major data types are as follows:

-       TransitStop: stop or station (variables: stop_id, stop_name, coordinates, bus, metro, tram).

-       TransitLine: sequence of stops (variables: line_id, headway, line_type, line_name, fromStop, toStop, stopList<Integer sequence, Long stop_id).

-       TransitTrip: an actual service on a line (variables: line_id, departure_time, frequency, stopToStopList<Integer sequence, StopToStop stop_to_stop).

-       TransitStopToStop: movement of a vehicle from one stop to the next (fromStop, toStop, departureTime, arrivalTime).

-       LineType: type of line.

-       VehicleType: type and characteristics of a vehicle.

More details in the package “transitSystem”.

Data accessible here (version 26.04.2018): 

frequencies_all.csv 

lines_all_clean.csv

links_all_clean.csv

stations_all_clean.csv

trips_all_clean.csv

vehicleTypes_all.csv


1.4.  Collect travel times from Google Maps

Once the previous data is read (either from .osm –very slow- or from already generated .csv files, as explained in 2.1) the program “transit” can download the travel times between successive stops using the Google Directions API. More details in the package “travelTimeFromGoogle”.

The use of the API requires a user api-key and is limited to 2,500 queries per day.

1.5.  Collect travel times using MATSim

Since not all the lines are present in Google Maps, or when the quota of the API is exceeded, the program “transit” can also calculate travel times between stops using the MATSim network. The program runs MATSim in non-congested conditions and queries travel times from coordinate to coordinate. More details in the package “travelTimeUsingMatsim”.

1.6.  Add external data informing about vehicles and frequencies

The previous methods do not allow easily to obtain the frequencies and vehicle types of each one of the lines. Then, this information is read from separate files (see importOsm/VehicleTypeReader.java and importOsm/FrequencyReader).

The following types of vehicles and/or lines are used (They differ in vehicle size and line frequency or headway):

-       Rural bus

-       Town bus

-       City bus

-       Tram

-       U-bahn

-       S-Bahn

1.7.  About the coverage of the data

Google Maps contains mainly data in the MVV area, long distance services. Most of rural areas are not covered. The other core cities are partially covered (i.e. only tram services but not every bus).

OSM contains every (or almost every) stop and line in the area. In some cases, the order of stops could not be extracted properly from the .osm files and is erroneous.

2.      Transit network generation

2.1.  Print out data as table

At any point of the data generation, the program “transit” prints out .csv files with the data, so they can be re-read and processed again. The following files are printed out:

-       Stops

-       Lines

-       Trips

-       Links

2.2.  Print out data as XML-MATSim files

In order to be used in MATSim, the program “transit” creates .xml files for the network, the schedule and the vehicles. The files are printed out as text files keeping the .xml structure specified by MATSim.

To do: it seems a good improvement to use a XML library (probably from MATSim code) to write the data from java. Then, the risk of writing errors can be reduced (See package “writeMATSimXmlFiles”).

The following assumptions are taken at each one:

-       Network: The network is completely independent from the road network. Links are segments between stops with unlimited capacity and free-flow speed higher than the needed to arrive to the next stop on schedule (based on the travel times between stops). Parallel lines (e.g. the Stammstrecke) have multiple links overlapped). It means that the effect of congestion on bus travel times is not considered.

-       Schedule: The schedule is based on the transit lines and transit trips obtained in section 1.4 and 1.5, and the headways read at section 1.6. Departures are created between 3:00 and 22:00, with the line headway. A 20 seconds time is added at every stop.

-       Vehicles: Each service is made by a different vehicle. Note that it is not possible to analyze the operation of the fleets.  

Data is accessible here (version 26.04.2018): 

network2018.xml

schedule2018.xml

vehicles2018.xml

2.3.  Print out data by mode

For some applications, it is required to segment the public transport supply. The program “transit” can divide the supply into 1) only bus, 2) bus, tram and metro or 3) all the services.

3.      Skim matrices

The calculation of travel time matrices is carried out in the program munichMatsim by using MATSim. The process is used to calculate total travel time, in-vehicle travel time, in-public-transport travel time, access time, egress time and number of transfers. 

The shortest route using transit is queried from each origin to each destination. The process of caluclating different components of public transport can be found in: https://github.com/cllorca1/matsimMunich/blob/munich/src/main/java/org/matsim/munichArea/RunMATSimTransitSkims.java


3.1.  Find TAZ with transit stops (served zones)

The TAZ that have at least one transit stops are defined as served zones. There are 2700 zones of a total of 4900.

3.2.  Generate synthetic travel demand for served zones

To calculate travel times between served zones, a synthetic passenger is sent from every origin to every destination (only half of the matrix, assuming that roundtrips are the same). This creates a total of 2700 ^ 2 / 2 O/D pairs (approx. 7.3 Million).

The trips start and end at the closest node to the zone centroid (deterministic) and at a random time uniformly distributed between 4:00 and 22:00. Trips are assigned to the transit network. The optimal runtime is found for a segmentation of the total amount of travelers into 50 to 200 independent simulations. This contributes to use an empty network, so only one iteration is used.

3.3.  Matrix specification and analysis of events

After each model run, the MATSim event file is analyzed. The following matrices are calculated:

-       Total time: time between the agent’s departure from home until the agent’s arrival at destination activity.

-       Access time: walking time between the agent’s departure and the arrival to the first transit facility.

-       In transit time: total time spent at transit system or in transfer (from the arrival to the first station until the departure from the last station).

-       In Vehicle time: time spent in the vehicle(s).

-       Egress time: walking time between the agents’ arrival to the last stop and the agent’s arrival at destination activity.

The following simplifications apply: return trips are not calculated, but just set as equal to the outbound trip. The access and egress times are exchanged by each other, but the in transit time and in vehicle time is not changed.

A matrix that inform about the route is calculated as well. It is stored as .csv file.

3.4.  Remove trips made by walk only

The trips that do not involve any in vehicle time are removed. MATSim assigned some trips that were optimum without entering in the transit system but only walking.

3.5.  Remove long access and egress times

The trips with access time or egress time higher than 30 minutes are removed.

To do: note that very long transfers by walk might still be possible. Check if MATSim accepts such long transfers.  

3.6.  Calculate travel times at non-served stops

The full matrix from 4900 zones to 4900 zones is filled following this process:

-       From each zone non served by transit i:

  • To each zone j:
    • For each zone served by transit k (access point):
      • For each zone served by transit l (egress point):
        • If k and l are connected by transit, and walking time(i,k) and walking time (l,j) are under 30 mins:
          • is the fastest connection between k and l?
          • choose k and l
      • next l
    • next k
    • fill the position i,j and j,i with:
      • total time(i,j) = in transit time (k,l) + walking time (i,k) + walking time (l,j) (same for return trip)
      • access time(i,j) = walking time (i,k) (return is walking time (l,j))
      • egress time(i,j) = walking time (l,j) (return is walking time (i,k))
      • in transit time(i,j) = in transit time (k,l) (same for return trip)
      • in vehicle time(i,j) = in vehicle time (k,l) (same for return trip)
      • transfers(i,j) = transfers(k,l) (same for return trip.


3.7.  Generate skim matrices by mode

The assignment is done three times: 1) only with bus, 2) with bus, tram and metro, 3) with every line.

After that, matrices from 1 and 2 are compared. Where there is a cell in in vehicle time of (2) with the same value as in (1), this entry is removed from 2, because it is a bus only, and metro or tram cannot be used. Where there is a cell in in vehicle time of (3) with the same value as in (2), this entry is removed from 3, because it is a bus, tram or metro trip, and train cannot be used.

To do: do we need to compare 1 and 3 as well?

The cleaning of this matrices represents that there is not a fastest route using the mentioned mode. Of course, it does not mean that the agent could use that mode. It solely means that it was not the best option.

The matrices are found in the shared folder and are not updated in the wiki. 

  • Keine Stichwörter