Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#597 closed defect (fixed)

sub-soi name incorrectly counted as two names

Reported by: stephankn Owned by: jocelyn
Priority: major Component: osmose-backend
Keywords: Cc:

Description

In Thailand a lot of streets (soi) are numbered as a minor enhancement, (sub-soi).
This happened when new alleys are build between existing ones and no renumbering was possible.

So between soi 5 and soi 6 you could have soi 5/1 (and sometimes 5/2, ...)

Maybe by checking the length of the second part it could be distinguished from a name. If it's more than 2 characters it's likely a name.

http://osmose.openstreetmap.fr/en/error/1045816056

Change History (11)

comment:1 Changed 5 years ago by frodrigo

You URL error has gone. Can you point to area with thing like this or to osm ids. Thank you.

comment:3 Changed 5 years ago by frodrigo

  • Owner changed from frodrigo to jocelyn
  • Status changed from new to assigned

comment:4 follow-up: Changed 5 years ago by stephankn

Situation improved a lot with last fix, still reports this false positive when numbers are written in Thai script:
"ป่าปี้ ซอย ๖/๑"

http://www.openstreetmap.org/way/153774102

have you considered taking into account the length of the parts? If a part is only one or two characters long, it's unlikely to be a secondary name...

comment:5 Changed 5 years ago by stephankn

Another false positive. Again a minimum length required for the "second part" might fix it.

This facility is called "One+"

http://www.openstreetmap.org/way/200708984

comment:6 in reply to: ↑ 4 Changed 5 years ago by frodrigo

Replying to stephankn:

Situation improved a lot with last fix, still reports this false positive when numbers are written in Thai script:
"ป่าปี้ ซอย ๖/๑"

http://www.openstreetmap.org/way/153774102

have you considered taking into account the length of the parts? If a part is only one or two characters long, it's unlikely to be a secondary name...

Yes, number only.

Can you provide me the tahi number list ?

comment:7 Changed 5 years ago by stephankn

Thai numbers:
http://fr.wikipedia.org/wiki/Num%C3%A9ration_tha%C3%AFe

How about changing the original regular expression to the following requiring the second part to be minimum 3 characters to match? Should not cause too many false negative and catch most false positive.

Do you have a database of reported false positives to match against? maybe grep for names with ".*[;+/].{,2}$" could create a list for a quick review whether there a cases which represent a secondary name which would result in a false negative.

How often is the plus character used to put in multiple names in the name tag? I have never seen such tagging. Most common would be "/", ";" and maybe the dash "-".

Slightly related:
Other cases you're missing are translations in brackets which should go into name:xx tags and not be in the name tag. That would probably be worth a special analyzer.

        self.Re1 = re.compile(u"^.*;.{3,}$")
        self.Re2 = re.compile(u"^.*/.{3,}$")
        self.Re3 = re.compile(u"^.*\+.{3,}$")

In case of too many false negative: Are these letters? Then only match letter which are shorter:

        self.Re2 = re.compile(u"^.*/(.{3,}|[a-zA-Z ]{,2})$")

comment:8 Changed 5 years ago by frodrigo

8b8b8c8

comment:9 Changed 5 years ago by jocelyn

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:10 Changed 5 years ago by frodrigo

Vietnam have similar issue. Need to be addressed in the same way ?

http://osmose.openstreetmap.fr/fr/errors/?country=vietnam&item=5030

comment:11 Changed 5 years ago by stephankn

I'm no expert for Vietnamese adressing schemes, been there only once.

By looking at the data I would say yes. Some of these certainly look genuine.

Not sure about the pattern. Maybe ignore problem if it contains <number>/<number>.

Note: See TracTickets for help on using tickets.