Home

Using Tamil unicode strings efficiently in python

string = "தமிழ்"
print(list(string))

Output:

['த', 'ம', 'ி', 'ழ', '்']

Normally, how python reads the unicode strings, this seems like python did not recoganize properly and let we see how tamilstring helps to handle this

1
2
3
from tamilString import String
string = "தமிழ்"
string = String(string)

Output:

['த', 'மி', 'ழ்']

output shows that using tamilstring helps to handle tamilstring unicodes more effectively.

Alternatives,

pip install Open-Tamil

Advantages why you have to choose tamilstring, it gives more feture comparing to alternatives, it will handle sanskrit better

from tamil.utf8 import get_letters as op_get_letters
from tamilstring import get_letters as ta_get_letters

print(op_get_letters("க்ஷ்"))
print(ta_get_letters("க்ஷ்"))

Output:

['க்', 'ஷ்']
['க்ஷ்']

print(op_get_letters("ஶ்ரீ"))
print(ta_get_letters("ஶ்ரீ"))

print(op_get_letters("ஸ்ரீ"))
print(ta_get_letters("ஸ்ரீ"))

Output:

['ஶ்', 'ரீ'] 
['ஶ்ரீ']
['ஸ்', 'ரீ']
['ஸ்ரீ']

when you notice that this both 'ஸ்ரீ','ஶ்ரீ' ( this looking both differnet from mobile ) are used to refer same letter

soon cython implemetation

DisAdvantages

time consuming

To explore what more tamilstring provides