Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

LuceneIndex export to SQL-database

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


andreas.nowitzki at anno-edv

Aug 15, 2012, 10:39 AM

Post #1 of 3 (237 views)
Permalink
LuceneIndex export to SQL-database

I am using lucene to produce several indexes from html-sites.
To work with them i convert the lucene database into sql via a small
programm. The main problem is that I take a small part of the collected
datafields ( datasource, plainTextContent, title, description and keyword).
But there are in most cases more than 1 position named #keyword, so I get
only the first one. For my sql-database I want to use all values of
#keyword, but how can this be done?

Below you find the sricpt which is used for converting.
Can anyone help me to create a solution?
With kind regards

Andreas

#######################################################
# Character encoding is UTF-8!!!!
#
#
# This file specifies all necessary parameters in order to build a csv file
or a
# database table out of an lucene index. The Fields that should be
transfered
# can be specified, together with the database location.
#


#The path to the lucene index
luceneIndexPath=r:\23._Neue_Einteilung_Indexer\2._Indexer2\2012\2012-08\index165\

#These attributes will be considered for conversion
#attribute2convert=urn:catwiesel:attribute:uri
attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#title
attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#description
#attribute2convert=urn:dynaq:buzzwords
attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#dataSource
attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#keyword


# This is for creating / appending / overwriting database tables
dataBaseConversion=
{
# The name of the database table that should be generated / appended /
overwritten
tableName=165_082012
#true: the database table will be overwritten, if it exists. false: the
data entries will be appended
overwriteIfExist=true

# These are the parameters for the connection. Yes, the password is NOT
so secure here...I'm sorry for that
username=root
password=

# This is the connection string to the database. There, the database
location and the database name is specified
# The connection string depends on your database.
# e.g. databaseURL=jdbc:mysql://[host:port]/[database]
databaseURL=jdbc:mysql://127.0.0.1:3306/luceneexport


# This is the driver for your database
# e.g. databaseDriver=org.hsqldb.jdbcDriver
# e.g. databaseDriver=com.mysql.jdbc.Driver
databaseDriver=com.mysql.jdbc.Driver

# This is the character that will be used to quote the table column
names in the SQL statements. Examples are:
#No quoting (also could comment out the line):tableColumnsQuoteChar=
# ANSI-standard: tableColumnsQuoteChar="
# MySQL: tableColumnsQuoteChar=`
tableColumnsQuoteChar=`

# Further, the database type of each attribute has to be specified in
order to create the database table. Also a
# new attribute name for the database column can be specified (note that
dabases have sometimes restrictions for
# the length of column names). E.g.:
# urn:dynaq:buzzwords=
# {
# columnType=TEXT
# columnName=buzzwords
# }

urn:catwiesel:attribute:uri=
{
columnType=TEXT
columnName=uri
}
http://www.semanticdesktop.org/ontologies/2007/01/19/nie#dataSource=
{
columnType=TEXT
columnName=dataSource
}

http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent=
{
columnType=LONGTEXT
columnName=plainTextContent
}
http://www.semanticdesktop.org/ontologies/2007/01/19/nie#description=
{
columnType=LONGTEXT
columnName=metadescription
}
http://www.semanticdesktop.org/ontologies/2007/01/19/nie#title=
{
columnType=TEXT
columnName=title
}

#urn:dynaq:buzzwords=
#{
# columnType=TEXT
# columnName=buzzwords
#}

http://www.semanticdesktop.org/ontologies/2007/01/19/nie#keyword=
{
columnType=LONGTEXT
columnName=metakeyword
}

#http://www.semanticdesktop.org/ontologies/2007/01/19/nie#mimeType=
#{
# columnType=TEXT
# columnName=mimeType
#}



}



--
View this message in context: http://lucene.472066.n3.nabble.com/LuceneIndex-export-to-SQL-database-tp4001450.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ian.lea at gmail

Aug 16, 2012, 3:01 AM

Post #2 of 3 (225 views)
Permalink
Re: LuceneIndex export to SQL-database [In reply to]

Is this a lucene question or a mysql question or what? Since this is
the lucene list let's assume you're asking about how to get multiple
values for a field from an index. Document.getValues("keyword") looks
promising: Returns an array of values of the field specified.


--
Ian.


On Wed, Aug 15, 2012 at 6:39 PM, ANNO61 <andreas.nowitzki [at] anno-edv> wrote:
> I am using lucene to produce several indexes from html-sites.
> To work with them i convert the lucene database into sql via a small
> programm. The main problem is that I take a small part of the collected
> datafields ( datasource, plainTextContent, title, description and keyword).
> But there are in most cases more than 1 position named #keyword, so I get
> only the first one. For my sql-database I want to use all values of
> #keyword, but how can this be done?
>
> Below you find the sricpt which is used for converting.
> Can anyone help me to create a solution?
> With kind regards
>
> Andreas
>
> #######################################################
> # Character encoding is UTF-8!!!!
> #
> #
> # This file specifies all necessary parameters in order to build a csv file
> or a
> # database table out of an lucene index. The Fields that should be
> transfered
> # can be specified, together with the database location.
> #
>
>
> #The path to the lucene index
> luceneIndexPath=r:\23._Neue_Einteilung_Indexer\2._Indexer2\2012\2012-08\index165\
>
> #These attributes will be considered for conversion
> #attribute2convert=urn:catwiesel:attribute:uri
> attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#title
> attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
> attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#description
> #attribute2convert=urn:dynaq:buzzwords
> attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#dataSource
> attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#keyword
>
>
> # This is for creating / appending / overwriting database tables
> dataBaseConversion=
> {
> # The name of the database table that should be generated / appended /
> overwritten
> tableName=165_082012
> #true: the database table will be overwritten, if it exists. false: the
> data entries will be appended
> overwriteIfExist=true
>
> # These are the parameters for the connection. Yes, the password is NOT
> so secure here...I'm sorry for that
> username=root
> password=
>
> # This is the connection string to the database. There, the database
> location and the database name is specified
> # The connection string depends on your database.
> # e.g. databaseURL=jdbc:mysql://[host:port]/[database]
> databaseURL=jdbc:mysql://127.0.0.1:3306/luceneexport
>
>
> # This is the driver for your database
> # e.g. databaseDriver=org.hsqldb.jdbcDriver
> # e.g. databaseDriver=com.mysql.jdbc.Driver
> databaseDriver=com.mysql.jdbc.Driver
>
> # This is the character that will be used to quote the table column
> names in the SQL statements. Examples are:
> #No quoting (also could comment out the line):tableColumnsQuoteChar=
> # ANSI-standard: tableColumnsQuoteChar="
> # MySQL: tableColumnsQuoteChar=`
> tableColumnsQuoteChar=`
>
> # Further, the database type of each attribute has to be specified in
> order to create the database table. Also a
> # new attribute name for the database column can be specified (note that
> dabases have sometimes restrictions for
> # the length of column names). E.g.:
> # urn:dynaq:buzzwords=
> # {
> # columnType=TEXT
> # columnName=buzzwords
> # }
>
> urn:catwiesel:attribute:uri=
> {
> columnType=TEXT
> columnName=uri
> }
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#dataSource=
> {
> columnType=TEXT
> columnName=dataSource
> }
>
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent=
> {
> columnType=LONGTEXT
> columnName=plainTextContent
> }
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#description=
> {
> columnType=LONGTEXT
> columnName=metadescription
> }
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#title=
> {
> columnType=TEXT
> columnName=title
> }
>
> #urn:dynaq:buzzwords=
> #{
> # columnType=TEXT
> # columnName=buzzwords
> #}
>
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#keyword=
> {
> columnType=LONGTEXT
> columnName=metakeyword
> }
>
> #http://www.semanticdesktop.org/ontologies/2007/01/19/nie#mimeType=
> #{
> # columnType=TEXT
> # columnName=mimeType
> #}
>
>
>
> }
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/LuceneIndex-export-to-SQL-database-tp4001450.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


shubalubdub at gmail

Aug 16, 2012, 11:10 AM

Post #3 of 3 (224 views)
Permalink
Re: LuceneIndex export to SQL-database [In reply to]

#keyword values need to be stored in a separate table that references the
main table so that you can have multiple #keyword rows per document row in
the main table. I have no idea how to make your script do this and I don't
even know if it can. You might want to check the documentation or
discussion list for your script to find out how to use it with multiValued
fields; if it does handle them, it might be with a somewhat different
method from what I described above.

Good luck!

On Wed, Aug 15, 2012 at 1:39 PM, ANNO61 <andreas.nowitzki [at] anno-edv>wrote:

> I am using lucene to produce several indexes from html-sites.
> To work with them i convert the lucene database into sql via a small
> programm. The main problem is that I take a small part of the collected
> datafields ( datasource, plainTextContent, title, description and keyword).
> But there are in most cases more than 1 position named #keyword, so I get
> only the first one. For my sql-database I want to use all values of
> #keyword, but how can this be done?
>
> Below you find the sricpt which is used for converting.
> Can anyone help me to create a solution?
> With kind regards
>
> Andreas
>
> #######################################################
> # Character encoding is UTF-8!!!!
> #
> #
> # This file specifies all necessary parameters in order to build a csv file
> or a
> # database table out of an lucene index. The Fields that should be
> transfered
> # can be specified, together with the database location.
> #
>
>
> #The path to the lucene index
>
> luceneIndexPath=r:\23._Neue_Einteilung_Indexer\2._Indexer2\2012\2012-08\index165\
>
> #These attributes will be considered for conversion
> #attribute2convert=urn:catwiesel:attribute:uri
> attribute2convert=
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#title
>
> attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
> attribute2convert=
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#description
> #attribute2convert=urn:dynaq:buzzwords
> attribute2convert=
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#dataSource
>
> attribute2convert=http://www.semanticdesktop.org/ontologies/2007/01/19/nie#keyword
>
>
> # This is for creating / appending / overwriting database tables
> dataBaseConversion=
> {
> # The name of the database table that should be generated / appended /
> overwritten
> tableName=165_082012
> #true: the database table will be overwritten, if it exists. false: the
> data entries will be appended
> overwriteIfExist=true
>
> # These are the parameters for the connection. Yes, the password is NOT
> so secure here...I'm sorry for that
> username=root
> password=
>
> # This is the connection string to the database. There, the database
> location and the database name is specified
> # The connection string depends on your database.
> # e.g. databaseURL=jdbc:mysql://[host:port]/[database]
> databaseURL=jdbc:mysql://127.0.0.1:3306/luceneexport
>
>
> # This is the driver for your database
> # e.g. databaseDriver=org.hsqldb.jdbcDriver
> # e.g. databaseDriver=com.mysql.jdbc.Driver
> databaseDriver=com.mysql.jdbc.Driver
>
> # This is the character that will be used to quote the table column
> names in the SQL statements. Examples are:
> #No quoting (also could comment out the line):tableColumnsQuoteChar=
> # ANSI-standard: tableColumnsQuoteChar="
> # MySQL: tableColumnsQuoteChar=`
> tableColumnsQuoteChar=`
>
> # Further, the database type of each attribute has to be specified in
> order to create the database table. Also a
> # new attribute name for the database column can be specified (note
> that
> dabases have sometimes restrictions for
> # the length of column names). E.g.:
> # urn:dynaq:buzzwords=
> # {
> # columnType=TEXT
> # columnName=buzzwords
> # }
>
> urn:catwiesel:attribute:uri=
> {
> columnType=TEXT
> columnName=uri
> }
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#dataSource=
> {
> columnType=TEXT
> columnName=dataSource
> }
>
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent=
> {
> columnType=LONGTEXT
> columnName=plainTextContent
> }
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#description=
> {
> columnType=LONGTEXT
> columnName=metadescription
> }
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#title=
> {
> columnType=TEXT
> columnName=title
> }
>
> #urn:dynaq:buzzwords=
> #{
> # columnType=TEXT
> # columnName=buzzwords
> #}
>
> http://www.semanticdesktop.org/ontologies/2007/01/19/nie#keyword=
> {
> columnType=LONGTEXT
> columnName=metakeyword
> }
>
> #http://www.semanticdesktop.org/ontologies/2007/01/19/nie#mimeType=
> #{
> # columnType=TEXT
> # columnName=mimeType
> #}
>
>
>
> }
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/LuceneIndex-export-to-SQL-database-tp4001450.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.