SQL Swiss Army Knife - Engine for Diverse Data Sources
Musoq is a powerful engine designed to apply SQL syntax across a variety of data sources, making data querying more intuitive and accessible. Whether it's files, directories, comma separated values, or even complex data structures, Musoq simplifies data access.
...and many more
You can run it from your CLI. Just follow the instructions from CLI repository
*
to select all columns.column.Name
.API of the engine were improved so it is possible now to integrate seamlessly with LLMs. For example, I made a custom plugin that uses enhanced syntax and query the invoice file based on pdf.co and GPT 4. This is the query I have constructed:
table PdfInvoice {
ItemPosition 'int',
ItemName 'string',
ItemPrice 'decimal'
};
couple #custom.invoices with table PdfInvoice as SourceOfInvoiceValues;
select
ItemPosition,
ItemName,
ItemPrice
from SourceOfInvoiceValues('./Invoice.pdf') where ItemPrice > 0
Query above will effectivelly extract table from invoice with the column you asking for based on LLM inference on requested columns and their data types.
select
csv.PostId,
csv.Comment,
gpt.Sentiment(csv.Comment) as Sentiment,
csv.Date
from #separatedvalues.csv('/home/somebody/comments_sample.csv', true, 0) csv
inner join #openai.gpt('gpt-4-1106-preview') gpt on 1 = 1
.png
or .jpg
SELECT
FullName
FROM #os.files('C:/Some/Path/To/Dir', true)
WHERE Extension = '.png' OR Extension = '.jpg'
in
operator:SELECT
FullName
FROM #os.files('C:/Some/Path/To/Dir', true)
WHERE Extension IN ('.png', '.jpg')
SELECT
ID,
Name,
DLC,
CycleTime
from #can.messages('./file.dbc')
or signals:
SELECT
Name,
ByteOrder,
Length,
StartBit,
Factor,
...
from #can.signals('./file.dbc')
SELECT Concat(Column1, Column2) as ConcatenatedColumn from #separatedvalues.csv('./file.csv', true, 0)
SELECT
DirectoryName,
Sum(Length) / 1024 / 1024 as 'MB',
Min(Length) as 'Min',
Max(Length) as 'Max',
Count(FullName) as 'CountOfFiles',
FROM #os.files('', true)
GROUP BY DirectoryName
report
in his name:SELECT
*
FROM #os.files('', true)
WHERE Name like '%report%'
SELECT
FullName
FROM #os.files('E:/', true)
WHERE
IsAudio() AND
HasWordThatSoundLike(Name, 'material')
SELECT
ToHex(Head(5), '|'),
ToHex(Tail(5), '|'),
ToHex(GetFileBytes(10, 5), '|')
FROM #os.files('', false)
WITH filesOfA AS (
SELECT
GetRelativeName('E:\DiffDirsTests\A') AS FullName,
Sha256File() AS ShaedFile
FROM #os.files('E:\DiffDirsTests\A', true)
), filesOfB AS (
SELECT
GetRelativeName('E:\DiffDirsTests\B') AS FullName,
Sha256File() AS ShaedFile
FROM #os.files('E:\DiffDirsTests\B', true)
), inBothDirs AS (
SELECT
a.FullName AS FullName,
(
CASE WHEN a.ShaedFile = b.ShaedFile
THEN 'The Same'
ELSE 'Modified'
END
) AS Status
FROM filesOfA a INNER JOIN filesOfB b ON a.FullName = b.FullName
), inSourceDir AS (
SELECT
a.FullName AS FullName,
'Removed' AS Status
FROM filesOfA a LEFT OUTER JOIN filesOfB b ON a.FullName = b.FullName
), inDestinationDir AS (
SELECT
b.FullName AS FullName,
'Added' AS Status
FROM filesOfA a RIGHT OUTER JOIN filesOfB b ON a.FullName = b.FullName
)
SELECT
inBoth.FullName AS FullName,
inBoth.Status AS Status
FROM inBothDirs inBoth
UNION (FullName)
SELECT
inSource.FullName AS FullName,
inSource.Status AS Status
FROM inSourceDir inSource
UNION (FullName)
SELECT
inDest.FullName AS FullName,
inDest.Status AS Status
FROM inDestinationDir inDest
SELECT
(
CASE WHEN SourceFile IS NOT NULL
THEN SourceFileRelative
ELSE DestinationFileRelative
END
) AS FullName,
(
CASE WHEN State = 'TheSame'
THEN 'The Same'
ELSE State
END
) AS Status
FROM #os.dirscompare('E:\DiffDirsTests\A', 'E:\DiffDirsTests\B')
SELECT
DirectoryName,
AggregateValues(Name)
FROM #os.files('E:/', true)
WHERE IsZipArchive()
GROUP BY DirectoryName
SELECT
FullName
FROM #os.files('', true)
WHERE ToDecimal(Length) / 1024 / 1024 / 1024 > 1
.png
file through OCR plugin.SELECT
ocr.GetText(file.FullName) as text
FROM
#os.files('E:/Path/To/Directory', false) file
INNER JOIN
#ocr.single() ocr
ON 1 = 1
WHERE files.Extension = '.png'
SELECT Value FROM #system.range(1, 10)
You can easily plug-in your own data source. There is fairly simple plugin api that all sources use. To read in details how to do it, jump into wiki section of this repo click.
Developed out of a need for a versatile tool that could query various data sources with SQL syntax, Musoq aims to minimize the effort and time required for data querying and analysis.
As the language looks like sql, it doesn't mean it is fully SQL compliant. It uses SQL syntax and repeats some of it's behaviour however, some differences may appear. It will also implement some experimental syntax and behaviours that are not used by database engines.
Musoq is licensed under the MIT License - see the LICENSE file for details